hoelzer-lab / ribap

A comprehensive bacterial core gene-set annotation pipeline based on Roary and pairwise ILPs
GNU General Public License v3.0
25 stars 4 forks source link

Process `RIBAP:mmseqs2tsv` terminated with an error exit status (1) #58

Open GabrieleRigano99 opened 9 months ago

GabrieleRigano99 commented 9 months ago

Hi, after running the command: nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta "*.fasta" -profile local,docker (i have 3 genomes in my directory) i get this error ERROR ~ Error executing process > 'RIBAP:mmseqs2tsv'

Caused by: Process RIBAP:mmseqs2tsv terminated with an error exit status (1)

Command executed:

mkdir tsv

mmseq2tsv.py mmseq2_result.csv strain_ids.txt . 8 #tsv

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 94, in for idx, item in enumerate(chunks(blastTable, chunksize)): File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 21, in chunks for i in range(0, len(data), size): ValueError: range() arg 3 must not be zero

could you help me out please?

hoelzer commented 9 months ago

Hey @GabrieleRigano99

Thx for your interest in RIBAP!

Can you please try the following command:

nextflow run hoelzer-lab/riba -r 1.0.2 --fasta '*.fasta' -profile local,docker

Please note the ' instead of ". The reasoning behind that is, that " will be directly expanded in your terminal so your input command will look like this when using ":

nextflow run hoelzer-lab/riba -r 1.0.2 --fasta genome1.fasta genome2.fasta genome3.fasta -profile local,docker

and probably that's causing the issue with mmseqs2.

GabrieleRigano99 commented 9 months ago

Hi @hoelzer, thank you for the reply! Unfortunately it keeps giving me the same error...

my command: nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker

ERROR ~ Error executing process > 'RIBAP:mmseqs2tsv'

Caused by: Process RIBAP:mmseqs2tsv terminated with an error exit status (1)

Command executed:

mkdir tsv

mmseq2tsv.py mmseq2_result.csv strain_ids.txt . 8 #tsv

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 94, in for idx, item in enumerate(chunks(blastTable, chunksize)): File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 21, in chunks for i in range(0, len(data), size): ValueError: range() arg 3 must not be zero

klamkiew commented 9 months ago

Hi there and thanks a lot for the report :)

It looks like there is something fishy going on in the chunk size calculation to evaluate the MMSeqs2 results efficiently. The fact that size equals zero according to the command error seems very odd. Can you have a look at the previous intermediate results (prokka, mmseqs2), to estimate the number of genes per genome?

We have a line of code that divides all MMSeqs results into chunks, and if I remember correctly, the default number of chunks is 8. There could be something going on with the overall MMSeqs2 table being smaller than 8 (which seems very weird, given that you are using three genomes)

As a work-around, you could try to set the --chunks parameter to 1, but this is, of course, no satisfying solution and it will slow down the process and might even not work for your machine, given that it will load everything as one chunk into your memory.

nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chunks 1

So: if this command works for the mmseqs2tsv process, please have a look into the prokka and mmseqs2 results, if possible, and double-check that these seem correct and valid.

If the same error occurs, even with setting --chunks 1, we'd have to dig deeper ;)

GabrieleRigano99 commented 9 months ago

Hi @klamkiew ! Unfortunately it keeps giving me the same error even with the --chuncks 1 flag

nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chuncks 1

I checked the Prokka and MMSeqs results and I can't find any problem in those. I got respectively 5616, 5123 and 4813 genes in my genomes. [7f/6e805a] process > RIBAP:rename (3) [100%] 3 of 3 ✔ [9b/e0e319] process > RIBAP:prokka (3) [100%] 3 of 3 ✔ [df/a2d352] process > RIBAP:strain_ids [100%] 1 of 1 ✔ [69/2e66aa] process > RIBAP:roary (3) [100%] 2 of 2 [68/7d6694] process > RIBAP:mmseqs2 [100%] 1 of 1 ✔ [20/b42de8] process > RIBAP:mmseqs2tsv [100%] 1 of 1, failed: 1 ✘

The process stops like this

hoelzer commented 9 months ago

Hey @GabrieleRigano99 was this your command?

nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chuncks 1

?

because then --chuncks should be --chunks

Can you provide the three genome FASTAs? For example, as a zip archive here? So we can try them out

Thanks!

GabrieleRigano99 commented 9 months ago

oh my bad! I accidentally typed --chuncks instead of --chunks. It worked with this command, but it took 3h 17m nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chunks 1 thanks for the support!

hoelzer commented 9 months ago

Alright, so @klamkiew was on the right path and actually the calculations work but there is something off with the chunking of the mmseqs2 results. And actually we do the chunking to reduce the runtime.

When you look at the results now, seems the prokka and mmseqs2 results are ok? Do you get as many genes predicted as you would expect per genome? Do you see something odd?

GabrieleRigano99 commented 9 months ago

I can't notice anything odd, it looks good to me in every step

hoelzer commented 9 months ago

Ok, thanks for checking. Can you share the 3x FASTA files or are these confidential? I could also provide a secure exchange server if this is fine for you. Otherwise, just zip them in one archive and upload them here. Then we can do some troubleshoting

GabrieleRigano99 commented 9 months ago

I'm sorry, I can't share these data unfortunately. Thank you for your work and for helping me out!

hoelzer commented 9 months ago

Ok no problem and that's understandable when the data is confidential. Unfortunately, it's then difficult for us to debug. Maybe: when you are using three other input genomes is it working then? Just some random genomes from NCBI or so. With the default --chunks 8? Maybe there is something in general not working for small input sizes

GabrieleRigano99 commented 8 months ago

Hi @hoelzer, sorry for the late reply, I was working on other projects. I tried to add 4 genomes to the previous 3 (different species, but all cyanobacteria) with --chunks 8. Unfortunately it died with this error:

nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chunks 8

[b2/e7bfb7] process > RIBAP:rename (2) [100%] 7 of 7 ✔ [c6/626d00] process > RIBAP:prokka (7) [100%] 7 of 7 ✔ [6d/6d9494] process > RIBAP:strain_ids [100%] 1 of 1 ✔ [94/f9f534] process > RIBAP:roary (3) [100%] 3 of 3 [df/5f65fc] process > RIBAP:mmseqs2 [100%] 1 of 1 ✔ [12/9cf5c1] process > RIBAP:mmseqs2tsv [100%] 1 of 1 ✔ [c1/70b024] process > RIBAP:ilp_refinement (11) [ 14%] 1 of 7, failed: 1 [- ] process > RIBAP:combine_roary_ilp [ 0%] 0 of 1 [- ] process > RIBAP:prepare_msa - [- ] process > RIBAP:mafft - [- ] process > RIBAP:fasttree - [- ] process > RIBAP:nw_display - [- ] process > RIBAP:generate_html - [- ] process > RIBAP:generate_upsetr_input - [- ] process > RIBAP:upsetr - ERROR ~ Error executing process > 'RIBAP:ilp_refinement (6)'

Caused by: Process RIBAP:ilp_refinement (6) terminated with an error exit status (1)

Command executed:

derive_ilp_solutions.py --tmlim 240 --max --indel mmseqs_compressed_chunk4.pkl

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/derive_ilp_solutions.py", line 151, in main() File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/derive_ilp_solutions.py", line 69, in main blastTable = read_blast_table(pickled_data) File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/derive_ilp_solutions.py", line 100, in read_blast_table blastTable = pickle.load(inputStream) EOFError: Ran out of input

Work dir: /home/gab/Chroococcidiopsis_project/new_space_pangenome/work/7c/b0fc30d81c7e2c632ccd01b4db7a12

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '.nextflow.log' file for details Could you help me out please?