qfo / benchmark-webservice

This repository contains the codebase that runs the webservice to benchmark orthology predictions on a common reference proteome dataset.
http://orthology.benchmark-service.org
Other
7 stars 8 forks source link

issues #7

Closed pmjklemm closed 1 year ago

pmjklemm commented 1 year ago

I just downloaded the repository and followed the usage guide on the main page (build the docker and downloaded the 2018 dataset). But the help does not work:

nextflow run main.nf --help

results in

N E X T F L O W  ~  version 19.10.0
Launching `main.nf` [happy_ritchie] - revision: 8557f8ca0a
No such variable: param

 -- Check script 'main.nf' at line: 3 or see '.nextflow.log' file for more details

Further more a direct run of the example case (using the 2018 dataset) fails too:

nextflow run main.nf

results in

N E X T F L O W  ~  version 19.10.0
Launching `main.nf` [compassionate_jennings] - revision: 8557f8ca0a
==============================================
 QFO ORTHOLOGY BENCHMARKING PIPELINE
==============================================
input file: /home/paul/Downloads/benchmark-webservice/example/oma-groups.orthoxml.gz
method name : OMA Groups
goldstandard path (refset): reference_data/2018
benchmarking community = QfO
selected benchmarks: GO EC VGNC SwissTrees TreeFam-A STD_Eukaryota STD_Fungi STD_Bacteria G_STD_Luca G_STD_Eukaryota G_STD_Vertebrata G_STD_Fungi G_STD2_Luca G_STD2_Fungi G_STD2_Eukaryota G_STD2_Vertebrata
Evidence filter for GO benchmark: exp
Public Benchmark results: reference_data/data
validation results directory: out/participant_out
assessment results directory: out/assessment_out/Assessment_datasets.json
consolidated benchmark results directory: out/results
Statistics results about nextflow run: out/stats
Benchmarking data model file location: out/benchmarking_data_model_export/consolidated_results.json
Directory with community-specific results: out/other
executor >  local (1)
[94/bc0379] process > validate_input_file              [  0%] 0 of 1
executor >  local (1)
executor >  local (1)
[94/bc0379] process > validate_input_file              [100%] 1 of 1, failed: 1 ✘
[-        ] process > convertPredictions               -
[-        ] process > scheduleMetrics                  -
[-        ] process > go_benchmark                     -
[-        ] process > ec_benchmark                     -
[-        ] process > swissprot_benchmark              -
[-        ] process > vgnc_benchmark                   -
[-        ] process > speciestree_benchmark            -
[-        ] process > g_speciestree_benchmark          -
[-        ] process > g_speciestree_benchmark_variant2 -
[-        ] process > reference_genetrees_benchmark    -
[-        ] process > consolidate                      -
Error executing process > 'validate_input_file'

Caused by:
  Process `validate_input_file` terminated with an error exit status (127)

Command executed:

  /benchmark/validate.py --com QfO --challenges_ids "GO EC VGNC SwissTrees TreeFam-A STD_Eukaryota STD_Fungi STD_Bacteria G_STD_Luca G_STD_Eukaryota G_STD_Vertebrata G_STD_Fungi G_STD2_Luca G_STD2_Fungi G_STD2_Eukaryota G_STD2_Vertebrata" --participant "OMA_Groups" --out "participant.json" 2018/mapping.json.gz oma-groups.orthoxml.gz

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: /benchmark/validate.py: No such file or directory

Work dir:
  /home/paul/Downloads/benchmark-webservice/work/94/bc03791678469a66366bae82711333

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

I am on ubuntu and happy to provide more informations

alpae commented 1 year ago

Hi @pmjklemm ,

the help issue should have been fixed with the latest commits. To run the workflow, you should use the -profile docker option. running it directly won't work.

Best wishes Adrian

pmjklemm commented 1 year ago

That is a good point, maybe add the -profile docker to the usage.

I after using -profile docker I get a permission denied error:

docker: Error response from daemon: error while creating mount source path '/home/paul/Downloads/benchmark-webservice': mkdir /home/paul/Downloads: permission denied.
  time="2023-05-05T08:42:19+02:00" level=error msg="error waiting for container: "
Here is my full error log ``` rm -rf out/; nextflow run -profile docker main.nf N E X T F L O W ~ version 19.10.0 Launching `main.nf` [scruffy_golick] - revision: c4f9a20cdd ============================================== QFO ORTHOLOGY BENCHMARKING PIPELINE ============================================== input file: /home/paul/Downloads/benchmark-webservice/example/oma-groups.orthoxml.gz method name : OMA Groups goldstandard path (refset): reference_data/2018 benchmarking community = QfO selected benchmarks: GO EC VGNC SwissTrees TreeFam-A STD_Eukaryota STD_Fungi STD_Bacteria G_STD_Luca G_STD_Eukaryota G_STD_Vertebrata G_STD_Fungi G_STD2_Luca G_STD2_Fungi G_STD2_Eukaryota G_STD2_Vertebrata Evidence filter for GO benchmark: exp Public Benchmark results: reference_data/data validation results directory: out/participant_out assessment results directory: out/assessment_out/Assessment_datasets.json consolidated benchmark results directory: out/results Statistics results about nextflow run: out/stats Benchmarking data model file location: out/benchmarking_data_model_export/consolidated_results.json Directory with community-specific results: out/other executor > local (1) [f2/8a4bd5] process > validate_input_file [ 0%] 0 of 1 executor > local (1) executor > local (1) [f2/8a4bd5] process > validate_input_file [100%] 1 of 1, failed: 1 ✘ [- ] process > convertPredictions - [- ] process > scheduleMetrics - [- ] process > go_benchmark - [- ] process > ec_benchmark - [- ] process > swissprot_benchmark - [- ] process > vgnc_benchmark - [- ] process > speciestree_benchmark - [- ] process > g_speciestree_benchmark - [- ] process > g_speciestree_benchmark_variant2 - [- ] process > reference_genetrees_benchmark - [- ] process > consolidate - Error executing process > 'validate_input_file' Caused by: Process `validate_input_file` terminated with an error exit status (126) Command executed: /benchmark/validate.py --com QfO --challenges_ids "GO EC VGNC SwissTrees TreeFam-A STD_Eukaryota STD_Fungi STD_Bacteria G_STD_Luca G_STD_Eukaryota G_STD_Vertebrata G_STD_Fungi G_STD2_Luca G_STD2_Fungi G_STD2_Eukaryota G_STD2_Vertebrata" --participant "OMA_Groups" --out "participant.json" 2018/mapping.json.gz oma-groups.orthoxml.gz Command exit status: 126 Command output: (empty) Command error: docker: Error response from daemon: error while creating mount source path '/home/paul/Downloads/benchmark-webservice': mkdir /home/paul/Downloads: permission denied. time="2023-05-05T08:42:19+02:00" level=error msg="error waiting for container: " Work dir: /home/paul/Downloads/benchmark-webservice/work/f2/8a4bd5c226f92165133a46fb4bf5a9 Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line ```

Why does docker need to mkdir my Downloads directory, this already exists. Any hint what is going wrong?

Update: I gave the Directory all the rights possible (777), still same error. Proably it tries to mkdir this path in the docker container, where /home/paul does not exists maybe?

pmjklemm commented 1 year ago

Update I managed to fix the last permission issue by moving everthing to /tmp, not sure why this fixes this issue but anyway. More issues arise. The program terminates but there are a lot of processes keep running (darwin.linux64)...

method name : OMA Groups
goldstandard path (refset): reference_data/2018
benchmarking community = QfO
selected benchmarks: GO EC VGNC SwissTrees TreeFam-A STD_Eukaryota STD_Fungi STD_Bacteria G_STD_Luca G_STD_Eukaryota G_STD_Vertebrata G_STD_Fungi G_STD2_Luca G_STD2_Fungi G_STD2_Eukaryota G_STD2_Vertebrata
Evidence filter for GO benchmark: exp
Public Benchmark results: reference_data/data
validation results directory: out/participant_out
assessment results directory: out/assessment_out/Assessment_datasets.json
consolidated benchmark results directory: out/results
Statistics results about nextflow run: out/stats
Benchmarking data model file location: out/benchmarking_data_model_export/consolidated_results.json
Directory with community-specific results: out/other
executor >  local (34)
[18/360d26] process > validate_input_file                       [100%] 1 of 1 ✔
[13/26b60a] process > convertPredictions                        [100%] 1 of 1 ✔
[87/ddebb2] process > scheduleMetrics (15)                      [100%] 16 of 16 ✔
[0e/f842b4] process > go_benchmark (1)                          [  0%] 0 of 1
executor >  local (34)
[18/360d26] process > validate_input_file                       [100%] 1 of 1 ✔
[13/26b60a] process > convertPredictions                        [100%] 1 of 1 ✔
[87/ddebb2] process > scheduleMetrics (15)                      [100%] 16 of 16 ✔
[0e/f842b4] process > go_benchmark (1)                          [100%] 1 of 1, failed: 1
executor >  local (34)
[18/360d26] process > validate_input_file                       [100%] 1 of 1 ✔
[13/26b60a] process > convertPredictions                        [100%] 1 of 1 ✔
[87/ddebb2] process > scheduleMetrics (15)                      [100%] 16 of 16 ✔
[0e/f842b4] process > go_benchmark (1)                          [100%] 1 of 1, failed: 1
[77/d7a77c] process > ec_benchmark (1)                          [100%] 1 of 1, failed: 1
[-        ] process > swissprot_benchmark                       -
[42/b2f21e] process > vgnc_benchmark (1)                        [100%] 1 of 1, failed: 1 ✘
[7d/ee2e91] process > speciestree_benchmark (Eukaryota)         [100%] 3 of 3, failed: 3
[59/34a950] process > g_speciestree_benchmark (Fungi)           [100%] 4 of 4, failed: 4
[6e/d23439] process > g_speciestree_benchmark_variant2 (Luca)   [100%] 4 of 4, failed: 4
[7e/7ae2fc] process > reference_genetrees_benchmark (TreeFam-A) [100%] 2 of 2, failed: 2
[-        ] process > consolidate                               -
WARN: Killing pending tasks (15)
Error executing process > 'vgnc_benchmark (1)'

Caused by:
  Process `vgnc_benchmark (1)` terminated with an error exit status (1)

Command executed:

  /benchmark/vgnc_benchmark.py          --com QfO          --participant "OMA_Groups"          --assessment-out "VGNC.json"          --outdir "results/VGNC"          --vgnc-orthologs 2018/vgnc-orthologs.txt.gz          --db orthologs.db

Command exit status:
  1

Command output:
  (empty)

Command error:
  2023-05-05 07:41:25,222 INFO   : running vgnc_benchmark with following arguments: Namespace(assessment_out='VGNC.json', com='QfO', db='orthologs.db', debug=False, log=None, outdir='results/VGNC', participant='OMA_Groups', vgnc_orthologs='2018/vgnc-orthologs.txt.gz')
  Traceback (most recent call last):
    File "/benchmark/vgnc_benchmark.py", line 152, in <module>
      vgnc_orthologs = get_vgnc_orthologs(conf.vgnc_orthologs)
    File "/benchmark/vgnc_benchmark.py", line 105, in get_vgnc_orthologs
      with auto_open(vgnc_orthologs_fname, 'rt') as fh:
    File "/benchmark/helpers.py", line 46, in auto_open
      return gzip.open(fn, *args, **kwargs)
    File "/usr/local/lib/python3.7/gzip.py", line 58, in open
      binary_file = GzipFile(filename, gz_mode, compresslevel)
    File "/usr/local/lib/python3.7/gzip.py", line 168, in __init__
      fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
  FileNotFoundError: [Errno 2] No such file or directory: '2018/vgnc-orthologs.txt.gz'

Work dir:
  /tmp/benchmark-webservice/work/42/b2f21e450a70dbf63046c6910441e7

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
alpae commented 1 year ago

Dear @pmjklemm,

the VGNC benchmark was not yet available in the 2018 dataset. We added it only in 2020. This is why this benchmark fails and stops the whole pipeline. The default parameters have been updated to 2020, but not example...

you can specify the set of benchmarks (or challenges in the OpenEBench terminology) with the argument --challenge_ids. The command

nextflow run main.nf -profile docker --challenge_ids "GO EC SwissTrees TreeFam-A G_STD2_Luca G_STD2_Fungi G_STD2_Eukaryota G_STD2_Vertebrata" 

should start only the relevant benchmarks for the 2018 dataset.

pmjklemm commented 1 year ago

You are right, I switched from 2018 to 2020 and now it works perfectly, thank you !