hoelzer-lab / ribap

A comprehensive bacterial core gene-set annotation pipeline based on Roary and pairwise ILPs
GNU General Public License v3.0
25 stars 4 forks source link

Missing output file(s) `*_RENAMED.fasta` expected by process `RIBAP:rename #70

Closed shlomobl closed 1 week ago

shlomobl commented 1 week ago

Hi, Thank you for this pipeline! This is exactly what I've been looking for to study some Staphylococcus spp. (Roary-like, but adapted to Genus-level)! Yet, I am having some troubles making run, and I really appreciate some help with the following:

shlomo@shlomo-HP-Z840:/media/shlomo/DATADRIVE3/CoNS/CoNS_FINAL$ nextflow run ribap -r 1.1.1 --fasta './Pilon_scaffolds_200/*.fasta' --gcode 11 --reference aureus_RF122.gb --tree --core_perc 99 --output '/media/shlomo/DATADRIVE3/CoNS/CoNS_FINAL/RIBAP' --keepILPs -w /media/shlomo/DATADRIVE3/CoNS/CoNS_FINAL/RIBAP_work -profile local,conda

 N E X T F L O W   ~  version 24.10.1

Launching `https://github.com/hoelzer-lab/ribap` [sad_liskov] DSL2 - revision: a61ffa369a [1.1.1]

Profile: local,conda

Current User: shlomo
Nextflow-version: 24.10.1
Starting time: 18-11-2024 12:21 UTC
Workdir location (intermediate files):
  /media/shlomo/DATADRIVE3/CoNS/CoNS_FINAL/RIBAP_work
Output dir name:
  /media/shlomo/DATADRIVE3/CoNS/CoNS_FINAL/RIBAP

Conda cache directory:
  conda
WARNING: ILPs will be stored which can take a lot of disk space!
[-        ] process > RIBAP:rename                -
executor >  local (6)
executor >  local (18)
executor >  local (26)
executor >  local (32)
executor >  local (32)
executor >  local (42)
[69/de6045] process > RIBAP:rename (CEH52232)                  [  0%] 0 of 223
[-        ] process > RIBAP:prokka                             -
[-        ] process > RIBAP:strain_ids                         -
executor >  local (44)
[96/f421c6] process > RIBAP:rename (CEA48682)                  [  6%] 15 of 223, failed: 15
[-        ] process > RIBAP:prokka                             -
[-        ] process > RIBAP:strain_ids                         -
[-        ] process > RIBAP:roary                              -
[-        ] process > RIBAP:mmseqs2                            -
[-        ] process > RIBAP:mmseqs2tsv                         -
[-        ] process > RIBAP:ilp_refinement                     -
[-        ] process > RIBAP:combine_roary_ilp                  -
[-        ] process > RIBAP:prepare_msa                        -
[-        ] process > RIBAP:mafft                              -
[-        ] process > RIBAP:fasttree                           -
[-        ] process > RIBAP:nw_display                         -
[-        ] process > RIBAP:generate_html                      -
[-        ] process > RIBAP:generate_upsetr_input              -
[-        ] process > RIBAP:upsetr                             -
[-        ] process > RIBAP:filter_alignment                   -
[-        ] process > RIBAP:nexus_core                         -
[-        ] process > RIBAP:iqtree                             -
ERROR ~ Error executing process > 'RIBAP:rename (CAF75364)'

Caused by:
  Missing output file(s) `*_RENAMED.fasta` expected by process `RIBAP:rename (CAF75364)`

Command executed:

  rename.sh CAF75364.fasta

Command exit status:
  0

Command output:
  (empty)

Command error:
  java -ea -Xmx1g -cp /home/shlomo/Programs/bbmap/current/ jgi.RenameReads CAF75364.fasta
  Executing jgi.RenameReads [CAF75364.fasta]

  Time: 0.259 seconds.

Work dir:
  /media/shlomo/DATADRIVE3/CoNS/CoNS_FINAL/RIBAP_work/07/c60c5840f466ab6661be2934ac090e

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
WARN: Killing running tasks (29)

1- Looks like it halts at different points. Before this trial, it stopped at the very first fasta file. Now looks it moved to the second one? 2- I checked the 'rename.sh' script - and it works. But looks like the problem is somewhere at that point... 3- I cannot find the 'nextflow.log' file anywhere! 4- Tried version 1.1.0 - same same.

Thanks!

shlomobl commented 1 week ago
shlomo@shlomo-HP-Z840:/media/shlomo/DATADRIVE3/CoNS/CoNS_FINAL$ rename.sh --version
java -ea -Xmx1g -cp /home/shlomo/Programs/bbmap/current/ jgi.RenameReads --version
BBMap version 38.97
klamkiew commented 1 week ago

Hi @shlomobl,

thanks a lot for using RIBAP. To me it looks like your installation of bbmap is chiming in as they also have a rename.sh file (see: https://github.com/BioInfoTools/BBMap/blob/master/sh/rename.sh)

Can you check your $PATH variable after switching to the corresponding RIBAP-conda environment and check, in which order the RIBAP path and the BBMap path are? It looks like your shell wants to execute the "wrong" rename.sh due to the ordering in your $PATH variable :)

P.S.: Since you wrote nextflow.log instead of .nextflow.log : notice the dot before the filename, indicating a "hidden" file when looking with ls. An ls -a should do the trick ;)

hoelzer commented 1 week ago

Yes, +1 @klamkiew

Besides, I also see that you want to run the pipeline on 223 genomes. That's a lot for the current implementation of the pipeline and especially running that many genomes on a "local" machine (and not a cluster with higher parallelization) will take a loooong time (and a lot of disk space). We discussed this current limitation of RIBAP in the paper.

Fortunately, Staphylococcus genomes are not that big but still, I am afraid the pipeline might run a long time. So I suggest, when the problem with BBMap is fixed, that you run on a cluster (if available) or first try a smaller subset of your genomes.

We're working on a faster version of RIBAP in the context of a master thesis but this will take more time and evaluation.

shlomobl commented 1 week ago

Hi, Thanks both for the quick reply! In the meantime I turned to Docker and so far it is working, at least it got pass the 'rename.sh' issue :-) But I will look into that, thanks! I got 203 genomes + 20 references, ~2.5-3 Mb each. From the paper I understand this may be a bit of a large dataset for RIBAP, but I wanted to give it a try. Time is not a problem. I do hope that there will be no other issues like with memory etc. I have 32 cores and 125 Gb memory available, and fingers crossed. If I do need to run less genomes a time - how would you suggest to do that?

hoelzer commented 1 week ago

Perfect, switching to Docker is even better bc more stable.

The RAM should be fine but keep an eye on your disk space as well. Especially with the --keepILPs option.

When the pipeline stops at some point, try -resume.

If I do need to run less genomes a time - how would you suggest to do t

Maybe you can reduce your genomes by selecting only one representative. It's not ideal, but you could calculate ANI or https://github.com/hoelzer/pocp and pick only one representative genome per cluster of highly similar genomes. But of course, this will reduce your overall resolution.

Fingers crossed, it runs through with that many genomes in a not-too-crazy amount of time...

I will close this issue for now bc it was solved by the switch to Docker

Thanks again for using the pipeline!