Closed shlomobl closed 1 week ago
shlomo@shlomo-HP-Z840:/media/shlomo/DATADRIVE3/CoNS/CoNS_FINAL$ rename.sh --version
java -ea -Xmx1g -cp /home/shlomo/Programs/bbmap/current/ jgi.RenameReads --version
BBMap version 38.97
Hi @shlomobl,
thanks a lot for using RIBAP. To me it looks like your installation of bbmap is chiming in as they also have a rename.sh
file (see: https://github.com/BioInfoTools/BBMap/blob/master/sh/rename.sh)
Can you check your $PATH
variable after switching to the corresponding RIBAP-conda environment and check, in which order the RIBAP path and the BBMap path are? It looks like your shell wants to execute the "wrong" rename.sh
due to the ordering in your $PATH
variable :)
P.S.: Since you wrote nextflow.log
instead of .nextflow.log
: notice the dot before the filename, indicating a "hidden" file when looking with ls
. An ls -a
should do the trick ;)
Yes, +1 @klamkiew
Besides, I also see that you want to run the pipeline on 223 genomes. That's a lot for the current implementation of the pipeline and especially running that many genomes on a "local" machine (and not a cluster with higher parallelization) will take a loooong time (and a lot of disk space). We discussed this current limitation of RIBAP in the paper.
Fortunately, Staphylococcus genomes are not that big but still, I am afraid the pipeline might run a long time. So I suggest, when the problem with BBMap
is fixed, that you run on a cluster (if available) or first try a smaller subset of your genomes.
We're working on a faster version of RIBAP in the context of a master thesis but this will take more time and evaluation.
Hi, Thanks both for the quick reply! In the meantime I turned to Docker and so far it is working, at least it got pass the 'rename.sh' issue :-) But I will look into that, thanks! I got 203 genomes + 20 references, ~2.5-3 Mb each. From the paper I understand this may be a bit of a large dataset for RIBAP, but I wanted to give it a try. Time is not a problem. I do hope that there will be no other issues like with memory etc. I have 32 cores and 125 Gb memory available, and fingers crossed. If I do need to run less genomes a time - how would you suggest to do that?
Perfect, switching to Docker
is even better bc more stable.
The RAM should be fine but keep an eye on your disk space as well. Especially with the --keepILPs
option.
When the pipeline stops at some point, try -resume
.
If I do need to run less genomes a time - how would you suggest to do t
Maybe you can reduce your genomes by selecting only one representative. It's not ideal, but you could calculate ANI or https://github.com/hoelzer/pocp and pick only one representative genome per cluster of highly similar genomes. But of course, this will reduce your overall resolution.
Fingers crossed, it runs through with that many genomes in a not-too-crazy amount of time...
I will close this issue for now bc it was solved by the switch to Docker
Thanks again for using the pipeline!
Hi, Thank you for this pipeline! This is exactly what I've been looking for to study some Staphylococcus spp. (Roary-like, but adapted to Genus-level)! Yet, I am having some troubles making run, and I really appreciate some help with the following:
1- Looks like it halts at different points. Before this trial, it stopped at the very first fasta file. Now looks it moved to the second one? 2- I checked the 'rename.sh' script - and it works. But looks like the problem is somewhere at that point... 3- I cannot find the 'nextflow.log' file anywhere! 4- Tried version 1.1.0 - same same.
Thanks!