metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
377 stars 99 forks source link

[rule classify] fails on example data #406

Closed zztin closed 3 years ago

zztin commented 3 years ago

Hi,

I followed the steps in the documentation for initializing a database, create config file and start running atlas. Most jobs passed but rule classify failed. I tried multiple times. The slurm resource I gave: srun --job-name "smk-atlas" --cpus-per-task 16 --gres=tmpspace:500G --mem-per-cpu 6000 --time 24:00:00 --pty bash

Can you help me check what possibly went wrong?

snakemake main command:

atlas run all --cores 8

[2021-06-25 00:22 INFO] Executing: snakemake --snakefile /hpc/compgen/users/lchen/mambaforge/envs/atlas_env/lib/python3.7/site-packages/atlas/Snakefile --directory /hpc/compgen/projects/proj_pathogen_cfDNA/atlas  --rerun-incomplete --configfile '/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/config.yaml' --nolock   --use-conda --conda-prefix /hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs   --scheduler greedy  all  --cores 8
Building DAG of jobs...
Updating job build_db_genomes.
Updating job combine_bined_coverages_MAGs.
Updating job combine_coverages_MAGs.
Updating job run_all_checkm_lineage_wf.
Updating job identify.
Updating job classify.
Updating job all_prodigal.
Updating job genomes.
Updating job gene2genome.
Updating job all_gtdb_trees.
Updating job classify.
Updating job combine_egg_nogg_annotations.
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Singularity containers: ignored
Job stats:
job         count    min threads    max threads
--------  -------  -------------  -------------
all             1              1              1
classify        1              8              8
genomes         1              1              1
total           3              1              8

[Fri Jun 25 00:22:50 2021]
rule classify:
    input: genomes/taxonomy/gtdb/align, genomes/genomes
    output: genomes/taxonomy/gtdb/classify
    log: logs/taxonomy/gtdbtk/classify.txt, genomes/taxonomy/gtdb/gtdbtk.log
    jobid: 130
    threads: 8
    resources: tmpdir=/scratch/7943009, mem=100, time=24

Activating conda environment: /hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566
[Fri Jun 25 00:43:04 2021]
Error in rule classify:
    jobid: 130
    output: genomes/taxonomy/gtdb/classify
    log: logs/taxonomy/gtdbtk/classify.txt, genomes/taxonomy/gtdb/gtdbtk.log (check log file(s) for error message)
    conda-env: /hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566
    shell:
        GTDBTK_DATA_PATH=/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/GTDB_V06 ; gtdbtk classify --genome_dir genomes/genomes --align_dir genomes/taxonomy/gtdb --out_dir genomes/taxonomy/gtdb --extension fasta --cpus 8 &> logs/taxonomy/gtdbtk/classify.txt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job classify since they might be corrupted:
genomes/taxonomy/gtdb/classify
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Note the path to the log file for debugging.
Documentation is available at: https://metagenome-atlas.readthedocs.io
Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues
Complete log: /hpc/compgen/projects/proj_pathogen_cfDNA/atlas/.snakemake/log/2021-06-25T002240.140877.snakemake.log
[2021-06-25 00:43 CRITICAL] Command 'snakemake --snakefile /hpc/compgen/users/lchen/mambaforge/envs/atlas_env/lib/python3.7/site-packages/atlas/Snakefile --directory /hpc/compgen/projects/proj_pathogen_cfDNA/atlas  --rerun-incomplete --configfile '/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/config.yaml' --nolock   --use-conda --conda-prefix /hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs   --scheduler greedy  all  --cores 8 ' returned non-zero exit status 1.

logs/taxonomy/gtdbtk/classify.txt

Traceback (most recent call last):
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/external/pplacer.py", line 124, in _worker
    raise PplacerException('An error was encountered while '
gtdbtk.exceptions.PplacerException: An error was encountered while running pplacer, check the log file: genomes/taxonomy/gtdb/classify/intermediate_results/pplacer/pplacer.bac120.out
^M                                   ^M[2021-06-25 00:43:02] ERROR: Controlled exit resulting from an unrecoverable error or warning.

================================================================================
EXCEPTION: PplacerException
  MESSAGE: An error was encountered while running pplacer.

Traceback (most recent call last):
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/__main__.py", line 95, in main
    gt_parser.parse_options(args)
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/main.py", line 735, in parse_options
    self.classify(options)
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/main.py", line 440, in classify
    classify.run(genomes,
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/classify.py", line 444, in run
    classify_tree = self.place_genomes(user_msa_file,
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/classify.py", line 240, in place_genomes
    pplacer.run(self.pplacer_cpus, 'wag', pplacer_ref_pkg, pplacer_json_out,
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/external/pplacer.py", line 92, in run
    raise PplacerException(
gtdbtk.exceptions.PplacerException: An error was encountered while running pplacer.
================================================================================

genomes/taxonomy/gtdb/gtdbtk.log

[2021-06-25 00:23:01] INFO: GTDB-Tk v1.5.0
[2021-06-25 00:23:01] INFO: gtdbtk classify --genome_dir genomes/genomes --align_dir genomes/taxonomy/gtdb --out_dir genomes/taxonomy/gtdb --extension fasta --cpus 8
[2021-06-25 00:23:01] INFO: Using GTDB-Tk reference data version r202: /hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/GTDB_V06
[2021-06-25 00:23:01] WARNING: pplacer requires ~204 GB of RAM to fully load the bacterial tree into memory. However, 131.81 GB was detected. This may affect pplacer performance, or fail if there is insufficient swap space.
[2021-06-25 00:23:01] TASK: Placing 3 bacterial genomes into reference tree with pplacer using 8 CPUs (be patient).
[2021-06-25 00:23:01] INFO: pplacer version: v1.1.alpha19-0-g807f6f3
[2021-06-25 00:43:02] ERROR: Controlled exit resulting from an unrecoverable error or warning.

================================================================================
EXCEPTION: PplacerException
  MESSAGE: An error was encountered while running pplacer.
________________________________________________________________________________

Traceback (most recent call last):
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/__main__.py", line 95, in main
    gt_parser.parse_options(args)
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/main.py", line 735, in parse_options
    self.classify(options)
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/main.py", line 440, in classify
    classify.run(genomes,
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/classify.py", line 444, in run
    classify_tree = self.place_genomes(user_msa_file,
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/classify.py", line 240, in place_genomes
    pplacer.run(self.pplacer_cpus, 'wag', pplacer_ref_pkg, pplacer_json_out,
  File "/hpc/compgen/projects/proj_pathogen_cfDNA/atlas/databases/conda_envs/b63cf6a8393c12a56f10f74648452566/lib/python3.8/site-packages/gtdbtk/external/pplacer.py", line 92, in run
    raise PplacerException(
gtdbtk.exceptions.PplacerException: An error was encountered while running pplacer.
================================================================================
SilasK commented 3 years ago

I will check this. Usually pplacerneeds unreasonable high amount of memory. I suggest to try with >150 largemem and <8 threads

SilasK commented 3 years ago

Continue discussion on #402