Closed mherold1 closed 2 years ago
Thank you very much for your effort, @mherold1.
We addressed some of your problems in version 1.0.1. We will systematically go through your issues and try to replicate the problems. A response will come as soon as possible.
thanks the only unsolved problem currently is with the gtdbtk step in module2
Great.
Version 1.0.1 added the new database version from GTDB-tk. If you delete your older gtdb-tk database and run the database-setup.sh script again, it should configure the new database.
You should have the gtdbtk/release207_v2/ in your system.
Let me know how it goes.
Thank you
Hi @JotaKas. I've been using MuDoGeR version 1.0.1, and it has been very practical so far. However, as mentioned above by @mherold1 , I'm encountering issues with the GTBD database installation. The folder for this database appears to be empty, whereas the other databases were installed without any problems. I installed MuDoGeR using Miniconda. Could you please advise on how to resolve this issue?
Hey @LaizaFaria,
I guess the quickest solution for you is simply to follow the instructions from the GTDB developers. The only Mudoger requirement is to have "gtdbtk/release207_v2/" (and the associated files to the release you are downloading).
Therefore, for release 214 you can
cd /path/to/your/database/folder
mkdir gtdbtk
cd ./gtdbtk
wget https://data.gtdb.ecogenomic.org/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz
tar xvzf gtdbtk_data.tar.gz
Then make sure the folder inside the gtdbtk folder has somehting like release###/
Thank you for your response! I noticed a more recent version of GTDB-Tk available, version 220. Would there be any issues with using this newer version?
Hi, thanks for making this available it seems like a very useful pipeline. The last few days I tested it a bit, mainly for running viral classification and wanted to share some issues I encountered. Since you released a new version earlier today, I'd like to mention that all of this is related to v1.0. After running tests are concluded I will try to update. On that note, what would be the best way to update? pull the repository and rerun the installation script?
installation
some tools did not install correctly and had to be fixed individually
khmer didnt install
java missing in vcontact step
installed openjdk on system
maxbin2 dependencies
prokka dependencies
solved by updating
prokka_env
conda environmentconda update --all
databases
GTDBTK_DATA_PATH not set
Virsorter setup
checkm
add trailing slash to bin/databases.sh DATABASE_LOCATION
running
module 1 preprocessing
metawrap naming convention... leaving out -m parameter -> stuck at
mudoger preprocess
problems with gzipped read files?module 2 - prokaryotes
not enough ram for pplacer?
checkm still runs, but is very slow, should I have specified -m in the command?
problem with pplacer during GTDBtk step
this directory is in
release207/split/high/pplacer
, also other files missing database version r207 should fit to gtdbtk version 2.1.1gtdbtk test
runs through succesfullymodule 3 viruses
Here I tested the individual module with existing assemblies, so without running module1 and 2 prior.
vibrant final output file empty
when running the viruses module command separately:
I had to adapt the script:
/mnt/RAID5/tools/miniconda3/envs/mudoger_env/bin/mudoger-module-3-1_viral-investigation.sh
from:to this:
alternatively and probably better, I should have renamed the input assembly file to final_assembly.fa :)
misc
On large(r) assemblies virfinder and virsorter are really slow unfortunately (I tested 5.5M contigs, virsorter ~ 24h, virfinder 2.5M after 4 days then I stopped it). Would it be good to include a filtering step like before viral classification discarding short contigs (as Vibrant does) or those assigned to bacteria already? Or would this be included in the previous modules 1 and/or 2?
in one of the later stages of the viruses module I get this error repeatedly (probably for every contig):
cat: /mnt/mudoger_workspace/2022/TESTS/test-5/SRR3138838/viruses/taxonomy/vcontact-output/genome_by_genome_overview.csv: No such file or directory
this seems like an old path still included somewhere or it is related to all steps requiring output from module2 failing