metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
370 stars 97 forks source link

Error in rule identify #538

Closed skeffington closed 2 years ago

skeffington commented 2 years ago

Hello,

I'm having problem gettting the rule 'identify' to run.

Error in rule identify:
    jobid: 0
    output: genomes/taxonomy/gtdb/identify
    log: logs/taxonomy/gtdbtk/identify.txt, genomes/taxonomy/gtdb/gtdbtk.log (check log file(s) for error message)
    conda-env: /lustre/projects/Research_Project-T115468/atlas_databases/conda_envs/1e49aab428742b0674475556e0f9f646
    shell:
        GTDBTK_DATA_PATH=/lustre/projects/Research_Project-T115468/atlas_databases/GTDB_V06 ; gtdbtk identify --genome_dir genomes/genomes --out_dir genomes/taxonomy/gtdb --extension fasta --cpus 8 &> logs/taxonomy/gtdbtk/identify.txt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.

The cluster log is:

================================================================================
                                     ERROR
________________________________________________________________________________

           The GTDB-Tk reference data does not exist or is corrupted.
GTDBTK_DATA_PATH=/lustre/projects/Research_Project-T115468/atlas_databases/GTDB_V06

   Please compare the checksum to those provided in the download repository.
          https://github.com/Ecogenomics/GTDBTk#gtdb-tk-reference-data
================================================================================

The directory /lustre/projects/Research_Project-T115468/atlas_databases/GTDB_V06 contains:

downloaded_success  downloadgtdb.sh  gtdbtk_r202_data.tar.gz  release202 

I had the problem with the gtbtk certificate, so I downloaded the database manually (in a slurm job), unpacked the archive and touched 'dounlaede_sucess'. I'm not sure if something's just not in the write place or if it's really corrupted or incomplete.

tail of the slurm stdout gives

release202/markers/pfam/individual_hmms/PF13685.7.hmm
release202/markers/pfam/Pfam-A.hmm
release202/markers/pfam/Pfam-A.hmm.h3i
release202/markers/pfam/Pfam-A.hmm.h3m
release202/markers/pfam/Pfam-A.hmm.h3f
release202/markers/pfam/Pfam-A.hmm.h3p
release202/markers/pfam/generatePfamdat.py
release202/markers/pfam/Pfam-A.hmm.dat
release202/msa/
release202/msa/gtdb_r202_ar122.faa
release202/msa/gtdb_r202_bac120.faa
release202/radii/
release202/radii/gtdb_radii.tsv
release202/mrca_red/
release202/mrca_red/gtdbtk_r202_ar122.tsv
release202/mrca_red/gtdbtk_r202_bac120.tsv
release202/metadata/
release202/metadata/metadata.txt
release202/pplacer/
release202/pplacer/gtdb_r202_bac120.refpkg/
release202/pplacer/gtdb_r202_bac120.refpkg/CONTENTS.json
release202/pplacer/gtdb_r202_bac120.refpkg/phylo_model0ghuj7aq.json
release202/pplacer/gtdb_r202_bac120.refpkg/bac120_msa_reps_r202.faa
release202/pplacer/gtdb_r202_bac120.refpkg/bac120_r202_unroot.pplacer.tree
release202/pplacer/gtdb_r202_bac120.refpkg/fitting_stats.log
release202/pplacer/gtdb_r202_ar122.refpkg/
release202/pplacer/gtdb_r202_ar122.refpkg/CONTENTS.json
release202/pplacer/gtdb_r202_ar122.refpkg/phylo_modelbf90ubog.json
release202/pplacer/gtdb_r202_ar122.refpkg/ar122_msa_reps_r202.faa
release202/pplacer/gtdb_r202_ar122.refpkg/ar122_r202_unroot.pplacer.tree
release202/pplacer/gtdb_r202_ar122.refpkg/fitting_stats.log
release202/manifest.tsv

Any ideas would be appreciated! Thanks, Alastair

Atlas version 2.9.1

jmtsuji commented 2 years ago

@skeffington It appears that your download of the GTDB did not finish properly. I am also using ATLAS 2.9.1**, and my GTDB_V06 directory contains the following:

downloaded_success  fastani  manifest.tsv  markers  masks  metadata  mrca_red  msa  pplacer  radii  taxonomy

I am able to run the GTDB-Tk steps of ATLAS successfully, so this seems to be an acceptable configuration.

I wonder if you just need to move the contents of the release202 directory into the main directory?

**Note: I downloaded the GTDB a while ago (I think as part of ATLAS 2.8.2), not using ATLAS 2.9.1. However, the GTDB did not change between ATLAS versions, so I just kept the same GTDB folder when I updated ATLAS.

skeffington commented 2 years ago

Thanks @jmtsuji moving everything in release202 up a level did the trick! It ran through until it got ensnared in the pplacer memory issue #402

jmtsuji commented 2 years ago

@skeffington Glad to hear you got the database working!

If you are running out of memory and don't have more memory available on your system, then one option is to use the latest version of GTDB-Tk, which can run on <40 GB of memory (instead of ~250 GB). I believe it was just implemented into ATLAS 2.10.0. The latest release (version 207) of the GTDB is needed.

jmtsuji commented 2 years ago

@skeffington Sorry, just saw your other issue #540 - replied to you there.

SilasK commented 2 years ago

Hey folks I made atlas v 2.10 available that should use the GTDB v7 RS 207