Closed animesh closed 3 years ago
You finished the binning. I think this is a good point to update to the latest atlas version.
Alternatively you could
conda activate /mnt/z/ayu/databases/conda_envs/f4069a5d conda install -y numpy conda deactivate
And try again
Maybe rename the hidden folder '.snakemake' to be save.
Also Provided cores: 1 (use --cores to define parallelism)
Thanks for the suggestions @SilasK 👍🏽 I tried your alternative but it failed as following, should I just go ahead with update, if so, what will be the easiest way without losing all the work done?
(base) animeshs@DMED7596:~$ conda activate /mnt/z/ayu/databases/conda_envs/f4069a5d
(/mnt/z/ayu/databases/conda_envs/f4069a5d) animeshs@DMED7596:~$ conda install -y numpy
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /mnt/z/ayu/databases/conda_envs/f4069a5d
added / updated specs:
- numpy
The following packages will be downloaded:
package | build
---------------------------|-----------------
certifi-2021.5.30 | py36h06a4308_0 139 KB
numpy-1.19.2 | py36h6163131_0 22 KB
numpy-base-1.19.2 | py36h75fe3a5_0 4.1 MB
------------------------------------------------------------
Total: 4.3 MB
The following NEW packages will be INSTALLED:
blas pkgs/main/linux-64::blas-1.0-openblas
numpy pkgs/main/linux-64::numpy-1.19.2-py36h6163131_0
numpy-base pkgs/main/linux-64::numpy-base-1.19.2-py36h75fe3a5_0
The following packages will be UPDATED:
ca-certificates conda-forge::ca-certificates-2020.12.~ --> pkgs/main::ca-certificates-2021.5.25-h06a4308_1
certifi conda-forge::certifi-2020.12.5-py36h5~ --> pkgs/main::certifi-2021.5.30-py36h06a4308_0
openssl conda-forge::openssl-1.1.1i-h7f98852_0 --> pkgs/main::openssl-1.1.1k-h27cfd23_0
Downloading and Extracting Packages
numpy-base-1.19.2 | 4.1 MB | ################################################################################################################################## | 100%
numpy-1.19.2 | 22 KB | ################################################################################################################################## | 100%
certifi-2021.5.30 | 139 KB | ################################################################################################################################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
ERROR conda.core.link:_execute(698): An error occurred while installing package 'defaults::numpy-base-1.19.2-py36h75fe3a5_0'.
Rolling back transaction: done
[Errno 2] No such file or directory: '/mnt/z/ayu/databases/conda_envs/f4069a5d/lib/python3.6/site-packages/numpy/__pycache__'
In theory the pipeline should be no incompatibilities between different versions (untill major version update).
To be sure I suggest running.
atlas run binning
To finish all the steps before dereplication. And then update.
binning went fine it seems but update is probably not working as python setup.py install
it still says
(atlas) animeshs@DMED7596:~/ayu$ atlas --version
atlas, version 2.4.4
assuming it went fine, reinvocation gave following error, any ideas whats up?
[2021-06-22 10:33 INFO] Executing: snakemake --snakefile /home/animeshs/miniconda3/envs/atlas/lib/python3.6/site-packages/atlas/Snakefile --directory /mnt/z/ayu --rerun-incomplete --configfile '/mnt/z/ayu/config.yaml' --nolock --use-conda --conda-prefix /mnt/z/ayu/databases/conda_envs all --cores 8
Building DAG of jobs...
Updating job 81 (combine_egg_nogg_annotations).
Using shell: /bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 align
19 align_reads_to_MAGs
1 all
1 all_gtdb_trees
1 all_prodigal
19 bam_2_sam_MAGs
1 build_db_genomes
1 classify
1 combine_bined_coverages_MAGs
1 combine_coverages_MAGs
19 convert_sam_to_bam
1 first_dereplication
1 gene2genome
1 genomes
1 identify
19 pileup_MAGs
1 rename_genomes
1 run_all_checkm_lineage_wf
1 second_dereplication
91
[Tue Jun 22 10:34:49 2021]
rule first_dereplication:
input: genomes/all_bins, genomes/quality.csv
output: genomes/pre_dereplication/dereplicated_genomes
log: logs/genomes/pre_dereplication.log
jobid: 463
threads: 8
resources: mem=160, time=12
Job counts:
count jobs
1 first_dereplication
1
[Tue Jun 22 10:34:53 2021]
Error in rule first_dereplication:
jobid: 0
output: genomes/pre_dereplication/dereplicated_genomes
log: logs/genomes/pre_dereplication.log (check log file(s) for error message)
conda-env: /mnt/z/ayu/databases/conda_envs/f4069a5d
shell:
rm -rf genomes/pre_dereplication ; dRep dereplicate --genomes genomes/all_bins/*.fasta --genomeInfo genomes/quality.csv --length 5000 --completeness 50 --contamination 10 --SkipSecondary --P_ani 0.95 --completeness_weight 1 --contamination_weight 5 --strain_heterogeneity_weight 1 --N50_weight 0.5 --size_weight 0 --MASH_sketch 5000 --processors 8 genomes/pre_dereplication &> logs/genomes/pre_dereplication.log
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Note the path to the log file for debugging.
Documentation is available at: https://metagenome-atlas.readthedocs.io
Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues
Complete log: /mnt/z/ayu/.snakemake/log/2021-06-22T103345.266546.snakemake.log
[2021-06-22 10:34 CRITICAL] Command 'snakemake --snakefile /home/animeshs/miniconda3/envs/atlas/lib/python3.6/site-packages/atlas/Snakefile --directory /mnt/z/ayu --rerun-incomplete --configfile '/mnt/z/ayu/config.yaml' --nolock --use-conda --conda-prefix /mnt/z/ayu/databases/conda_envs all --cores 8 ' returned non-zero exit status 1.
Can you try to update to v2.6a2
I tried cloning the repo and running setup but I still get same version name, so I am not sure how to go about it?
I don't know. I have the v2.6a2 on conda, and GitHub. In the newest version I have updated the Dreplication. so this should circumvent/sove your problem.
Why not using mamba to install a new atlas version.
The situation (appended below) is the same even with mamba, so guess binary is not being replaced?
(atlas) animeshs@DMED7596:~/ayu/atlas$ mamba install metagenome-atlas
__ __ __ __
/ \ / \ / \ / \
/ \/ \/ \/ \
███████████████/ /██/ /██/ /██/ /████████████████████████
/ / \ / \ / \ / \ \____
/ / \_/ \_/ \_/ \ o \__,
/ _/ \_____/ `
|/
███╗ ███╗ █████╗ ███╗ ███╗██████╗ █████╗
████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
██╔████╔██║███████║██╔████╔██║██████╔╝███████║
██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
██║ ╚═╝ ██║██║ ██║██║ ╚═╝ ██║██████╔╝██║ ██║
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝
mamba (0.7.14) supported by @QuantStack
GitHub: https://github.com/mamba-org/mamba
Twitter: https://twitter.com/QuantStack
█████████████████████████████████████████████████████████████
Looking for: ['metagenome-atlas']
pkgs/main/linux-64 Using cache
pkgs/main/noarch Using cache
pkgs/r/linux-64 Using cache
pkgs/r/noarch Using cache
Transaction
Prefix: /home/animeshs/miniconda3/envs/atlas
All requested packages already installed
(atlas) animeshs@DMED7596:~/ayu/atlas$ atlas --version
atlas, version 2.4.4
It is, you just need to tell mamba explicitly about mamba install metagenome-atlas=2.6a2
or maybe mamba update
@animesh Sometimes I find that you need to remove the entire conda env and then re-create it from scratch to get software to update to a higher version properly, e.g.,:
conda env remove -n atlas
conda create -y -n atlas -c conda-forge mamba python=3.7
conda activate atlas
mamba install -c bioconda -c conda-forge atlas=2.6a2
Thanks @SilasK @jmtsuji ,mamba install -c bioconda -c conda-forge metagenome-atlas=2.6a2
seems to have worked as
(atlas) animeshs@DMED7596:~/ayu/atlas$ atlas --version
atlas, version 2.6a2
but the run seems to be stuck at following for hour
[2021-06-23 10:21 INFO] Executing: snakemake --snakefile /home/animeshs/miniconda3/envs/atlas/lib/python3.7/site-packages/atlas/Snakefile --directory /mnt/z/ayu --rerun-incomplete --configfile '/mnt/z/ayu/config.yaml' --nolock --use-conda --conda-prefix /mnt/z/ayu/databases/conda_envs --scheduler greedy all --cores 8
localrules directive specifies rules that are not present in the Snakefile:
verify_eggNOG_files
Building DAG of jobs...
Updating job combine_egg_nogg_annotations.
Creating conda environment /home/animeshs/miniconda3/envs/atlas/lib/python3.7/site-packages/atlas/rules/../envs/checkm.yaml...
Downloading and installing remote packages.
Environment for ../../../home/animeshs/miniconda3/envs/atlas/lib/python3.7/site-packages/atlas/envs/checkm.yaml created (location: databases/conda_envs/36b789d6b4ba8a7de1acdd08ea16a9b3)
Creating conda environment /home/animeshs/miniconda3/envs/atlas/lib/python3.7/site-packages/atlas/rules/../envs/report.yaml...
Downloading and installing remote packages.
is this normal?
It's not uncommon that the report takes long to install, but I hoped it should be better with the new version. Do you have snakemake version >6.1?
Yes, it took a couple of hours and then progressed 👍🏽 although it crashed later just 3% short of completion:
Refining topology: 25 rounds ME-NNIs, 2 rounds ME-SPRs, 13 rounds ML-NNIs
Total branch-length 16.700 after 5.99 sec, 1 of 78 splits
ML-NNI round 1: LogLk = -422511.503 NNIs 3 max delta 52.41 Time 15.31
Switched to using 20 rate categories (CAT approximation)20 of 20
Rate categories were divided by 1.058 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
Use -gamma for approximate but comparable Gamma(20) log-likelihoods
ML-NNI round 2: LogLk = -400196.696 NNIs 0 max delta 0.00 Time 22.27
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 3: LogLk = -399877.303 NNIs 0 max delta 0.00 Time 29.33 (final)
Optimize all lengths: LogLk = -399875.379 Time 31.69
Total time: 37.59 seconds Unique: 80/80 Bad splits: 0/77
[Wed Jun 23 19:57:04 2021]
Finished job 1314.
134 of 139 steps (96%) done
[Wed Jun 23 19:57:04 2021]
localrule root_tree:
input: genomes/tree/gtdbtk.bac120.unrooted.nwk
output: genomes/tree/gtdbtk.bac120.nwk
log: logs/genomes/tree/root_tree_gtdbtk.bac120.log
jobid: 1313
wildcards: msa=gtdbtk.bac120
resources: tmpdir=/tmp, mem=160, time=12
Activating conda environment: /mnt/z/ayu/databases/conda_envs/0dac41a8
Activating conda environment: /mnt/z/ayu/databases/conda_envs/0dac41a8
Removing temporary output file genomes/tree/gtdbtk.bac120.unrooted.nwk.
[Wed Jun 23 19:57:17 2021]
Finished job 1313.
135 of 139 steps (97%) done
[Wed Jun 23 19:57:17 2021]
rule classify:
input: genomes/taxonomy/gtdb/align, genomes/genomes
output: genomes/taxonomy/gtdb/classify
log: logs/taxonomy/gtdbtk/classify.txt, genomes/taxonomy/gtdb/gtdbtk.log
jobid: 1201
threads: 8
resources: tmpdir=/tmp, mem=160, time=24
Activating conda environment: /mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de
[Thu Jun 24 00:21:26 2021]
Error in rule classify:
jobid: 1201
output: genomes/taxonomy/gtdb/classify
log: logs/taxonomy/gtdbtk/classify.txt, genomes/taxonomy/gtdb/gtdbtk.log (check log file(s) for error message)
conda-env: /mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de
shell:
GTDBTK_DATA_PATH=/mnt/z/ayu/databases/GTDB_V06 ; gtdbtk classify --genome_dir genomes/genomes --align_dir genomes/taxonomy/gtdb --out_dir genomes/taxonomy/gtdb --extension fasta --cpus 8 &> logs/taxonomy/gtdbtk/classify.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job classify since they might be corrupted:
genomes/taxonomy/gtdb/classify
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Note the path to the log file for debugging.
Documentation is available at: https://metagenome-atlas.readthedocs.io
Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues
Complete log: /mnt/z/ayu/.snakemake/log/2021-06-23T102101.523504.snakemake.log
[2021-06-24 00:21 CRITICAL] Command 'snakemake --snakefile /home/animeshs/miniconda3/envs/atlas/lib/python3.7/site-packages/atlas/Snakefile --directory /mnt/z/ayu --rerun-incomplete --configfile '/mnt/z/ayu/config.yaml' --nolock --use-conda --conda-prefix /mnt/z/ayu/databases/conda_envs --scheduler greedy all --cores 8 ' returned non-zero exit status 1.
digging into the logs/taxonomy/gtdbtk/classify.txt classify.txt issue seems to be pplacer
and it looks like there are multiple version of it
(atlas) animeshs@DMED7596:~/ayu$ find . -iname "pplacer"
./databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de/bin/pplacer
./databases/conda_envs/36b789d6b4ba8a7de1acdd08ea16a9b3/bin/pplacer
./databases/conda_envs/4290e12d/bin/pplacer
./databases/conda_envs/d83cddba/bin/pplacer
./databases/GTDB_V05/pplacer
./databases/GTDB_V06/pplacer
could that be the issue? BTW the snakemake version is
./databases/GTDB_V06/pplacer (atlas) animeshs@DMED7596:~/ayu$ snakemake --version 6.5.0
Is the genomes/taxonomy/gtdb/classify/intermediate_results/pplacer/pplacer.bac120.out
available?
Do you have really the 160gb available to pplcaer. This tool often uses a lot of resources.
I have 128gb real and allowed 256gb as swap, can that be the issue? Also, cant find pplacer.bac120.out in the pwd, is it suppose to be somewhere else?
I have 128gb real and allowed 256gb as swap, can that be the issue?
Yes, I think this could be the issue. If you could limit the large mem the config file to what you have is probably best.
I reduced it to ~60gb config.zip but it failed with following message
[2021-06-24 14:42 INFO] Executing: snakemake --snakefile /home/animeshs/miniconda3/envs/atlas/lib/python3.7/site-packages/atlas/Snakefile --directory /mnt/z/ayu --rerun-incomplete --configfile '/mnt/z/ayu/config.yaml' --nolock --use-conda --conda-prefix /mnt/z/ayu/databases/conda_envs --scheduler greedy all --cores 12
localrules directive specifies rules that are not present in the Snakefile:
verify_eggNOG_files
Building DAG of jobs...
Updating job build_db_genomes.
Updating job combine_bined_coverages_MAGs.
Updating job combine_coverages_MAGs.
Updating job run_all_checkm_lineage_wf.
Updating job identify.
Updating job classify.
Updating job all_prodigal.
Updating job genomes.
Updating job gene2genome.
Updating job all_gtdb_trees.
Updating job classify.
Updating job combine_egg_nogg_annotations.
Using shell: /bin/bash
Provided cores: 12
Rules claiming more threads will be scaled down.
Singularity containers: ignored
Job stats:
job count min threads max threads
-------------- ------- ------------- -------------
all 1 1 1
all_gtdb_trees 1 1 1
classify 1 12 12
genomes 1 1 1
total 4 1 12
[Thu Jun 24 14:42:30 2021]
rule classify:
input: genomes/taxonomy/gtdb/align, genomes/genomes
output: genomes/taxonomy/gtdb/classify
log: logs/taxonomy/gtdbtk/classify.txt, genomes/taxonomy/gtdb/gtdbtk.log
jobid: 1201
threads: 12
resources: tmpdir=/tmp, mem=60, time=24
Activating conda environment: /mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de
[Thu Jun 24 19:00:49 2021]
Error in rule classify:
jobid: 1201
output: genomes/taxonomy/gtdb/classify
log: logs/taxonomy/gtdbtk/classify.txt, genomes/taxonomy/gtdb/gtdbtk.log (check log file(s) for error message)
conda-env: /mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de
shell:
GTDBTK_DATA_PATH=/mnt/z/ayu/databases/GTDB_V06 ; gtdbtk classify --genome_dir genomes/genomes --align_dir genomes/taxonomy/gtdb --out_dir genomes/taxonomy/gtdb --extension fasta --cpus 12 &> logs/taxonomy/gtdbtk/classify.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job classify since they might be corrupted:
genomes/taxonomy/gtdb/classify
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Note the path to the log file for debugging.
Documentation is available at: https://metagenome-atlas.readthedocs.io
Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues
Complete log: /mnt/z/ayu/.snakemake/log/2021-06-24T144202.958779.snakemake.log
[2021-06-24 19:01 CRITICAL] Command 'snakemake --snakefile /home/animeshs/miniconda3/envs/atlas/lib/python3.7/site-packages/atlas/Snakefile --directory /mnt/z/ayu --rerun-incomplete --configfile '/mnt/z/ayu/config.yaml' --nolock --use-conda --conda-prefix /mnt/z/ayu/databases/conda_envs --scheduler greedy all --cores 12 ' returned non-zero exit status 1.
looking at
classify.txt
says genomes/taxonomy/gtdb/classify/intermediate_results/pplacer/pplacer.bac120.json has no placements!
and
gtdbtk.log
complains about RAM... any ideas to get out of this catch-22 like situation?
I assume this is the error
WARNING: pplacer requires ~204 GB of RAM to fully load the bacterial tree into memory.
@animesh You need to set the memory >210gb
large_mem: 250
and threads <=8
Just check that in the genomes/taxonomy/gtdb/align
everithing is ok. e.g. that there are genomes to be placed.
@zztin
Where do I add the parameter large_mem: ?
On Fri, Jun 25, 2021 at 11:22 Silas Kieser @.***> wrote:
I assume this is the error
WARNING: pplacer requires ~204 GB of RAM to fully load the bacterial tree into memory.
@animesh https://github.com/animesh You need to set the memory >210gb
large_mem: 250
and threads <=8
Just check that in the genomes/taxonomy/gtdb/align everithing is ok. e.g. that there are genomes to be placed.
@zztin https://github.com/zztin
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/metagenome-atlas/atlas/issues/402#issuecomment-868362427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH2QX2MTCRPBPFX5IMASTYLTURDEBANCNFSM4666VG2Q .
In the atlas config file in the working dir
I have the following in the align folder, does it look fine?
(atlas) animeshs@DMED7596:~/ayu$ ls -ltrh genomes/taxonomy/gtdb/align/
total 226M
drwxrwxrwx 1 animeshs animeshs 4.0K Jun 23 19:53 intermediate_results
-rwxrwxrwx 1 animeshs animeshs 0 Jun 23 19:56 gtdbtk.bac120.filtered.tsv
-rwxrwxrwx 1 animeshs animeshs 226M Jun 23 19:56 gtdbtk.bac120.msa.fasta
-rwxrwxrwx 1 animeshs animeshs 395K Jun 23 19:56 gtdbtk.bac120.user_msa.fasta
And I tried with large-mem
[2021-06-25 11:27 INFO] Executing: snakemake --snakefile /home/animeshs/miniconda3/envs/atlas/lib/python3.7/site-packages/atlas/Snakefile --directory /mnt/z/ayu --rerun-incomplete --configfile '/mnt/z/ayu/config.yaml' --nolock --use-conda --conda-prefix /mnt/z/ayu/databases/conda_envs --scheduler greedy all --cores 8
localrules directive specifies rules that are not present in the Snakefile:
verify_eggNOG_files
Building DAG of jobs...
Updating job build_db_genomes.
Updating job combine_bined_coverages_MAGs.
Updating job combine_coverages_MAGs.
Updating job run_all_checkm_lineage_wf.
Updating job identify.
Updating job classify.
Updating job all_prodigal.
Updating job genomes.
Updating job gene2genome.
Updating job all_gtdb_trees.
Updating job classify.
Updating job combine_egg_nogg_annotations.
Using shell: /bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Singularity containers: ignored
Job stats:
job count min threads max threads
-------------- ------- ------------- -------------
all 1 1 1
all_gtdb_trees 1 1 1
classify 1 8 8
genomes 1 1 1
total 4 1 8
[Fri Jun 25 11:28:05 2021]
rule classify:
input: genomes/taxonomy/gtdb/align, genomes/genomes
output: genomes/taxonomy/gtdb/classify
log: logs/taxonomy/gtdbtk/classify.txt, genomes/taxonomy/gtdb/gtdbtk.log
jobid: 1201
threads: 8
resources: tmpdir=/tmp, mem=250, time=24
Activating conda environment: /mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de
[Fri Jun 25 16:13:01 2021]
Error in rule classify:
jobid: 1201
output: genomes/taxonomy/gtdb/classify
log: logs/taxonomy/gtdbtk/classify.txt, genomes/taxonomy/gtdb/gtdbtk.log (check log file(s) for error message)
conda-env: /mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de
shell:
GTDBTK_DATA_PATH=/mnt/z/ayu/databases/GTDB_V06 ; gtdbtk classify --genome_dir genomes/genomes --align_dir genomes/taxonomy/gtdb --out_dir genomes/taxonomy/gtdb --extension fasta --cpus 8 &> logs/taxonomy/gtdbtk/classify.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job classify since they might be corrupted:
genomes/taxonomy/gtdb/classify
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Note the path to the log file for debugging.
Documentation is available at: https://metagenome-atlas.readthedocs.io
Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues
Complete log: /mnt/z/ayu/.snakemake/log/2021-06-25T112738.602238.snakemake.log
[2021-06-25 16:13 CRITICAL] Command 'snakemake --snakefile /home/animeshs/miniconda3/envs/atlas/lib/python3.7/site-packages/atlas/Snakefile --directory /mnt/z/ayu --rerun-incomplete --configfile '/mnt/z/ayu/config.yaml' --nolock --use-conda --conda-prefix /mnt/z/ayu/databases/conda_envs --scheduler greedy all --cores 8 ' returned non-zero exit status 1.
but the error remains
[2021-06-25 11:28:12] INFO: GTDB-Tk v1.5.0
[2021-06-25 11:28:12] INFO: gtdbtk classify --genome_dir genomes/genomes --align_dir genomes/taxonomy/gtdb --out_dir genomes/taxonomy/gtdb --extension fasta --cpus 8
[2021-06-25 11:28:12] INFO: Using GTDB-Tk reference data version r202: /mnt/z/ayu/databases/GTDB_V06
[2021-06-25 11:28:14] WARNING: pplacer requires ~204 GB of RAM to fully load the bacterial tree into memory. However, 65.86 GB was detected. This may affect pplacer performance, or fail if there is insufficient swap space.
[2021-06-25 11:28:14] TASK: Placing 80 bacterial genomes into reference tree with pplacer using 8 CPUs (be patient).
[2021-06-25 11:28:14] INFO: pplacer version: v1.1.alpha19-0-g807f6f3
[2021-06-25 16:12:53] ERROR: An error was encountered while running tog.
[2021-06-25 16:12:53] ERROR: Controlled exit resulting from an unrecoverable error or warning.
================================================================================
EXCEPTION: TogException
MESSAGE: b'Uncaught exception: Failure("genomes/taxonomy/gtdb/classify/intermediate_results/pplacer/pplacer.bac120.json has no placements!")\nFatal error: exception Failure("genomes/taxonomy/gtdb/classify/intermediate_results/pplacer/pplacer.bac120.json has no placements!")\n'
________________________________________________________________________________
Traceback (most recent call last):
File "/mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de/lib/python3.8/site-packages/gtdbtk/__main__.py", line 95, in main
gt_parser.parse_options(args)
File "/mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de/lib/python3.8/site-packages/gtdbtk/main.py", line 735, in parse_options
self.classify(options)
File "/mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de/lib/python3.8/site-packages/gtdbtk/main.py", line 440, in classify
classify.run(genomes,
File "/mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de/lib/python3.8/site-packages/gtdbtk/classify.py", line 444, in run
classify_tree = self.place_genomes(user_msa_file,
File "/mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de/lib/python3.8/site-packages/gtdbtk/classify.py", line 261, in place_genomes
pplacer.tog(pplacer_json_out, tree_file)
File "/mnt/z/ayu/databases/conda_envs/2bbacb1a5eea0785a80f07e0a09d94de/lib/python3.8/site-packages/gtdbtk/external/pplacer.py", line 235, in tog
raise TogException(proc_err)
gtdbtk.exceptions.TogException: b'Uncaught exception: Failure("genomes/taxonomy/gtdb/classify/intermediate_results/pplacer/pplacer.bac120.json has no placements!")\nFatal error: exception Failure("genomes/taxonomy/gtdb/classify/intermediate_results/pplacer/pplacer.bac120.json has no placements!")\n'
================================================================================
genomes/taxonomy/gtdb/gtdbtk.log (END)
but the swap seems to be sufficient?
(atlas) animeshs@DMED7596:~/ayu$ free
total used free shared buff/cache available
Mem: 65863788 291716 65473196 32 98876 65066708
Swap: 201326592 5280 201321312
so I guess this is not an atlas issue perse, wondering if there a way to make pplacer use this then?
Prpbably pplacer uses even more memory.
See also: https://ecogenomics.github.io/GTDBTk/faq.html I think pplacer uses 200 for loading the graph + ~ 150* threads memory
I managed to run the example data with 3 genomes and pplacer used only 1.
Do I understand it correctly, you are trying to use a tool that needy >250gb on a machine with 60gb? And most resources is in swap space?
Don't you have a cluster node with more memory? Do you really want to run the gtdb? Now you are almost there. But just remember you could deactivate this annotation.
I would try to decrease the number of threads to one or two.
Note to myself:
--scratch_dir
and --pplacer_cpus
options described.I tried with 350 but yes, mostly on swap i.e. 216gb as I don't have anymore physical RAM to go further :( This failed even with --cores=1 so I guess I need to move to a machine with at least 350 physical RAM? Wondering what will be the way to move this analysis further using an HPC?
Good news Atlas is designed to run on a HPC!! https://metagenome-atlas.readthedocs.io/en/latest/usage/getting_started.html#cluster-execution
Great, so can i just move this folder and invoke the "run all" command and it should start from where is crashed locally or there are some more tricks to be aware of?
You already installed the cluster wrapper as described in the docs? Which HPC system do you have? Do you have different partition/queue names e.g. one for big memory jobs?
Yes, you can start from where you left off. If you need to copy, copy also the hidden folder .snakemake
it's not necessary but probably better.
It should be quite easy, but HPC systems have always some surprises ready.
Invocation looks like following:
and logs/genomes/pre_dereplication.log says following:
however
so I am not sure what is the issue here?