Open GabrieleRigano99 opened 11 months ago
Hey @GabrieleRigano99
Thx for your interest in RIBAP!
Can you please try the following command:
nextflow run hoelzer-lab/riba -r 1.0.2 --fasta '*.fasta' -profile local,docker
Please note the '
instead of "
. The reasoning behind that is, that "
will be directly expanded in your terminal so your input command will look like this when using "
:
nextflow run hoelzer-lab/riba -r 1.0.2 --fasta genome1.fasta genome2.fasta genome3.fasta -profile local,docker
and probably that's causing the issue with mmseqs2
.
Hi @hoelzer, thank you for the reply! Unfortunately it keeps giving me the same error...
my command: nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker
ERROR ~ Error executing process > 'RIBAP:mmseqs2tsv'
Caused by:
Process RIBAP:mmseqs2tsv
terminated with an error exit status (1)
Command executed:
mmseq2tsv.py mmseq2_result.csv strain_ids.txt . 8 #tsv
Command exit status: 1
Command output: (empty)
Command error:
Traceback (most recent call last):
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 94, in
Hi there and thanks a lot for the report :)
It looks like there is something fishy going on in the chunk size calculation to evaluate the MMSeqs2 results efficiently. The fact that size
equals zero
according to the command error seems very odd.
Can you have a look at the previous intermediate results (prokka, mmseqs2), to estimate the number of genes per genome?
We have a line of code that divides all MMSeqs results into chunks, and if I remember correctly, the default number of chunks is 8
.
There could be something going on with the overall MMSeqs2 table being smaller than 8 (which seems very weird, given that you are using three genomes)
As a work-around, you could try to set the --chunks
parameter to 1
, but this is, of course, no satisfying solution and it will slow down the process and might even not work for your machine, given that it will load everything as one chunk into your memory.
nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chunks 1
So: if this command works for the mmseqs2tsv process, please have a look into the prokka and mmseqs2 results, if possible, and double-check that these seem correct and valid.
If the same error occurs, even with setting --chunks 1
, we'd have to dig deeper ;)
Hi @klamkiew ! Unfortunately it keeps giving me the same error even with the --chuncks 1 flag
nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chuncks 1
I checked the Prokka and MMSeqs results and I can't find any problem in those. I got respectively 5616, 5123 and 4813 genes in my genomes. [7f/6e805a] process > RIBAP:rename (3) [100%] 3 of 3 ✔ [9b/e0e319] process > RIBAP:prokka (3) [100%] 3 of 3 ✔ [df/a2d352] process > RIBAP:strain_ids [100%] 1 of 1 ✔ [69/2e66aa] process > RIBAP:roary (3) [100%] 2 of 2 [68/7d6694] process > RIBAP:mmseqs2 [100%] 1 of 1 ✔ [20/b42de8] process > RIBAP:mmseqs2tsv [100%] 1 of 1, failed: 1 ✘
The process stops like this
Hey @GabrieleRigano99 was this your command?
nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chuncks 1
?
because then --chuncks
should be --chunks
Can you provide the three genome FASTAs? For example, as a zip archive here? So we can try them out
Thanks!
oh my bad! I accidentally typed --chuncks instead of --chunks. It worked with this command, but it took 3h 17m nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chunks 1 thanks for the support!
Alright, so @klamkiew was on the right path and actually the calculations work but there is something off with the chunking of the mmseqs2 results. And actually we do the chunking to reduce the runtime.
When you look at the results now, seems the prokka
and mmseqs2
results are ok? Do you get as many genes predicted as you would expect per genome? Do you see something odd?
I can't notice anything odd, it looks good to me in every step
Ok, thanks for checking. Can you share the 3x FASTA files or are these confidential? I could also provide a secure exchange server if this is fine for you. Otherwise, just zip them in one archive and upload them here. Then we can do some troubleshoting
I'm sorry, I can't share these data unfortunately. Thank you for your work and for helping me out!
Ok no problem and that's understandable when the data is confidential. Unfortunately, it's then difficult for us to debug. Maybe: when you are using three other input genomes is it working then? Just some random genomes from NCBI or so. With the default --chunks 8
? Maybe there is something in general not working for small input sizes
Hi @hoelzer, sorry for the late reply, I was working on other projects. I tried to add 4 genomes to the previous 3 (different species, but all cyanobacteria) with --chunks 8. Unfortunately it died with this error:
nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta '*.fasta' -profile local,docker --chunks 8
[b2/e7bfb7] process > RIBAP:rename (2) [100%] 7 of 7 ✔ [c6/626d00] process > RIBAP:prokka (7) [100%] 7 of 7 ✔ [6d/6d9494] process > RIBAP:strain_ids [100%] 1 of 1 ✔ [94/f9f534] process > RIBAP:roary (3) [100%] 3 of 3 [df/5f65fc] process > RIBAP:mmseqs2 [100%] 1 of 1 ✔ [12/9cf5c1] process > RIBAP:mmseqs2tsv [100%] 1 of 1 ✔ [c1/70b024] process > RIBAP:ilp_refinement (11) [ 14%] 1 of 7, failed: 1 [- ] process > RIBAP:combine_roary_ilp [ 0%] 0 of 1 [- ] process > RIBAP:prepare_msa - [- ] process > RIBAP:mafft - [- ] process > RIBAP:fasttree - [- ] process > RIBAP:nw_display - [- ] process > RIBAP:generate_html - [- ] process > RIBAP:generate_upsetr_input - [- ] process > RIBAP:upsetr - ERROR ~ Error executing process > 'RIBAP:ilp_refinement (6)'
Caused by:
Process RIBAP:ilp_refinement (6)
terminated with an error exit status (1)
Command executed:
derive_ilp_solutions.py --tmlim 240 --max --indel mmseqs_compressed_chunk4.pkl
Command exit status: 1
Command output: (empty)
Command error:
Traceback (most recent call last):
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/derive_ilp_solutions.py", line 151, in
Work dir: /home/gab/Chroococcidiopsis_project/new_space_pangenome/work/7c/b0fc30d81c7e2c632ccd01b4db7a12
Tip: when you have fixed the problem you can continue the execution adding the option -resume
to the run command line
-- Check '.nextflow.log' file for details Could you help me out please?
Hi, after running the command: nextflow run hoelzer-lab/ribap -r 1.0.2 --fasta "*.fasta" -profile local,docker (i have 3 genomes in my directory) i get this error ERROR ~ Error executing process > 'RIBAP:mmseqs2tsv'
Caused by: Process
RIBAP:mmseqs2tsv
terminated with an error exit status (1)Command executed:
mkdir tsv
mmseq2tsv.py mmseq2_result.csv strain_ids.txt . 8 #tsv
Command exit status: 1
Command output: (empty)
Command error: Traceback (most recent call last): File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 94, in
for idx, item in enumerate(chunks(blastTable, chunksize)):
File "/home/gab/.nextflow/assets/hoelzer-lab/ribap/bin/mmseq2tsv.py", line 21, in chunks
for i in range(0, len(data), size):
ValueError: range() arg 3 must not be zero
could you help me out please?