metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
370 stars 97 forks source link

Error in rule #731

Open bharat1912 opened 2 months ago

bharat1912 commented 2 months ago

- [X ] I checked and didn't found a related issue,e.g. while typing the title - [X ] I got an error in the following rule(s): ** Error in rule classify jobid: 109 input: genomes/taxonomy/gtdb/align, genomes/genomes output: genomes/taxonomy/gtdb/classify log: logs/taxonomy/gtdbtk/classify.txt, genomes/taxonomy/gtdb/gtdbtk.log (check log file(s) for error details) conda-env: /media/bharat/volume1/atlas_db/condaenvs/749fbbbc10fcea13f8569630b38991e6 shell: export GTDBTK_DATA_PATH="/media/bharat/volume1/atlas_db/GTDB_V08_R214" ; gtdbtk classify --genome_dir genomes/genomes --align_dir genomes/taxonomy/gtdb --mash_db /media/bharat/volume1/atlas_db/GTDB_V08_R214/mash_db --out_dir genomes/taxonomy/gtdb --tmpdir /tmp --extension fasta --cpus 8 &> logs/taxonomy/gtdbtk/classify.txt (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) Logfile logs/taxonomy/gtdbtk/classify.txt:

- [X ] I checked the log files indicated indicated in the error message (and the cluster logs if submitted to a cluster) Here is the relevant log output: [2024-07-20 13:19:31] INFO: GTDB-Tk v2.3.2 [2024-07-20 13:19:31] INFO: gtdbtk classify --genome_dir genomes/genomes --align_dir genomes/taxonomy/gtdb --mash_db /media/bharat/volume1/atlas_db/GTDB_V08_R214/mash_db --out_dir genomes/taxonomy/gtdb --tmpdir /tmp --extension fasta --cpus 8 [2024-07-20 13:19:31] INFO: Using GTDB-Tk reference data version r214: /media/bharat/volume1/atlas_db/GTDB_V08_R214 [2024-07-20 13:19:33] INFO: Loading reference genomes. [2024-07-20 13:19:34] INFO: Using Mash version 2.3 [2024-07-20 13:19:34] INFO: Creating Mash sketch file: genomes/taxonomy/gtdb/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh [2024-07-20 13:19:37] INFO: Completed 48 genomes in 3.42 seconds (14.05 genomes/second). [2024-07-20 13:19:37] INFO: Loading data from existing Mash sketch file: /media/bharat/volume1/atlas_db/GTDB_V08_R214/mash_db.msh [2024-07-20 13:20:06] INFO: Calculating Mash distances. [2024-07-20 13:20:50] INFO: Calculating ANI with FastANI v1.32. [2024-07-20 13:21:59] INFO: Completed 382 comparisons in 1.16 minutes (330.33 comparisons/minute). [2024-07-20 13:21:59] INFO: 1 genome(s) have been classified using the ANI pre-screening step. [2024-07-20 13:21:59] TASK: Placing 2 archaeal genomes into reference tree with pplacer using 8 CPUs (be patient). [2024-07-20 13:22:00] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2024-07-20 13:27:49] INFO: Calculating RED values based on reference tree. [2024-07-20 13:27:50] TASK: Traversing tree to determine classification method. [2024-07-20 13:27:50] INFO: Completed 2 genomes in 0.00 seconds (6,091.94 genomes/second). [2024-07-20 13:27:50] TASK: Calculating average nucleotide identity using FastANI (v1.32). [2024-07-20 13:27:52] INFO: Completed 32 comparisons in 1.53 seconds (20.90 comparisons/second). [2024-07-20 13:27:52] INFO: 0 genome(s) have been classified using FastANI and pplacer. [2024-07-20 13:27:52] TASK: Placing 45 bacterial genomes into backbone reference tree with pplacer using 8 CPUs (be patient). [2024-07-20 13:27:52] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2024-07-20 13:31:11] INFO: Calculating RED values based on reference tree. [2024-07-20 13:31:12] INFO: 45 out of 45 have an class assignments. Those genomes will be reclassified. [2024-07-20 13:31:12] TASK: Placing 14 bacterial genomes into class-level reference tree 3 (1/6) with pplacer using 8 CPUs (be patient). [2024-07-20 13:35:15] ERROR: Controlled exit resulting from an unrecoverable error or warning.

================================================ EXCEPTION: PplacerException MESSAGE: An error was encountered while running pplacer

Traceback (most recent call last): File "/media/bharat/volume1/atlas_db/condaenvs/749fbbbc10fcea13f8569630b38991e6/lib/python3.8/site-packages/gtdbtk/main.py", line 102, in main gt_parser.parse_options(args) File "/media/bharat/volume1/atlas_db/condaenvs/749fbbbc10fcea13f8569630b38991e6/lib/python3.8/site-packages/gtdbtk/main.py", line 1204, in parse_options self.classify(options) File "/media/bharat/volume1/atlas_db/condaenvs/749fbbbc10fcea13f8569630b38991e6/lib/python3.8/site-packages/gtdbtk/main.py", line 587, in classify reports = classify.run(genomes=genomes, File "/media/bharat/volume1/atlas_db/condaenvs/749fbbbc10fcea13f8569630b38991e6/lib/python3.8/site-packages/gtdbtk/classify.py", line 608, in run low_classify_tree, submsa_file_path = self._place_in_low_tree( File "/media/bharat/volume1/atlas_db/condaenvs/749fbbbc10fcea13f8569630b38991e6/lib/python3.8/site-packages/gtdbtk/classify.py", line 787, in _place_in_low_tree low_classify_tree = self.place_genomes(submsa_file_path, File "/media/bharat/volume1/atlas_db/condaenvs/749fbbbc10fcea13f8569630b38991e6/lib/python3.8/site-packages/gtdbtk/classify.py", line 270, in place_genomes pplacer.run(self.pplacer_cpus, 'wag', pplacer_ref_pkg, pplacer_json_out, File "/media/bharat/volume1/atlas_db/condaenvs/749fbbbc10fcea13f8569630b38991e6/lib/python3.8/site-packages/gtdbtk/external/pplacer.py", line 92, in run raise PplacerException( gtdbtk.exceptions.PplacerException: An error was encountered while running pplacer.

Atlas version atlas, version 2.18.1+10.gae781611

SilasK commented 1 month ago

Pplacer is a complicated tool and needs a lot of memory per thread. make shure you have the memory defined in large_mem.

You can also try to add the flag --set-threads classify=2 to the atlas call to specifically decrease the n of threads for this rule.

bharat1912 commented 1 month ago

Thanks. I also notice that conda is using .local python lib ahead of the package version specified in the gtdb yaml file. I have removed the python package from local and am rerunning with you advise.

Will keep you posted

bharat1912 commented 1 month ago

I am using a computer with 256 M RAM and can use gtdb for taxonomic analysis when the program is installed independently but not when co-installed as part of the atlas workflow. The database installed with atlas is named GTDB_V09_R200 (though the version is listed as r220 on the gtdb website). Is the version number correct?

SilasK commented 1 month ago

I just corrected this wrong naming. Yws it's version R220.

SilasK commented 1 month ago

You are working with the master branch? Can you pull the latest commits and then move the database folder. I mean rename the gtdb folder.