metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

Error with rule root_tree during gtdb taxonomic annotation #286

Closed dscheah closed 3 years ago

dscheah commented 4 years ago

Hello! I managed to get the full Atlas pipeline to work by restricting resource memory, but I was curious if I could use gtdb markers for taxonomic annotation using the same restricted memory availability.

I amended the config file accordingly:

annotations:
  - gtdb_tree
  - gtdb_taxonomy
  - genes
#  - checkm_taxonomy
#  - checkm_tree

Eventually I received the following error messages:

Activating conda environment: /data/work/darrenc/Metagenomes/Enfield/20200217/databases/conda_envs/48b2c0be
Traceback (most recent call last):
  File "/data/work/darrenc/Metagenomes/Enfield/20200217/run/.snakemake/scripts/tmpamz3ffc0.root_tree.py", line 12, in <module>
    T.unroot()
  File "/data/work/darrenc/Metagenomes/Enfield/20200217/databases/conda_envs/48b2c0be/lib/python3.6/site-packages/ete3/coretype/tree.py", line 1325, in unroot
    raise TreeError("Cannot unroot a tree with only two leaves")
ete3.coretype.tree.TreeError: 'Cannot unroot a tree with only two leaves'
[Mon Mar  2 05:15:02 2020]
Error in rule root_tree:
    jobid: 609
    output: genomes/tree/gtdbtk.ar122.nwk
    log: logs/genomes/tree/root_tree_gtdbtk.ar122.log (check log file(s) for error message)
    conda-env: /data/work/darrenc/Metagenomes/Enfield/20200217/databases/conda_envs/48b2c0be

RuleException:
CalledProcessError in line 112 of /home/darrenc/atlas/atlas/rules/gtdbtk.smk:
Command 'source /home/darrenc/miniconda3/bin/activate '/data/work/darrenc/Metagenomes/Enfield/20200217/databases/conda_envs/48b2c0be'; set -euo pipefail;  python /data/work/darrenc/Metagenomes/Enfield/20200217/run/.snakemake/scripts/tmpamz3ffc0.root_tree.py' returned non-zero exit status 1.
  File "/home/darrenc/atlas/atlas/rules/gtdbtk.smk", line 112, in __rule_root_tree
  File "/home/darrenc/miniconda3/envs/atlasenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

I am wondering if this is a straightforward fix or another memory issue? Otherwise I am fine with only using the checkm markers.

Thank you very much!

SilasK commented 4 years ago

I fear not, https://github.com/Ecogenomics/GTDBTk#hardware-requirements

If you want, you can send me a griped version of your genomes/genomes and I ran gtdb-tk on my server.

dscheah commented 4 years ago

Hi @SilasK thanks for the offer. In your experience, do you think it would be beneficial to annotate with the GTDB database instead of checkm? Based on my understanding, checkm serves as more as an quality assessor of metagenome assembled genomes, rather than a taxonomic assignment toolkit like GTDB-Tk. Obviously with checkm I was able to run the pipeline with my limited memory resource, but I am wondering if GTDB-Tk would typically give greater depth/accuracy to the assignments. However, I assume that it would depend on the microbial diversity of the samples in question...

Thanks!

SilasK commented 4 years ago

Yes, GTDB gives more up to date and more in depth taxonomic annotation.

SilasK commented 4 years ago

May be this is another option: https://kbase.us/applist/apps/kb_gtdbtk/run_kb_gtdbtk/release?gclid=CjwKCAiAnfjyBRBxEiwA-EECLLgKajMbcUjWp8OlrgRu0TCOCTP3Eo6YC2Z3hW9SIAZZPmNCeo3sXxoCDpMQAvD_BwE

dscheah commented 4 years ago

I have heard about kbase -- unfortunately I am the getting close to the business end of my PhD and don't have enough time to wrangle a new pipeline. If you are able annotate my MAGs on your server using GTDB-Tk, I think I will push my files on to a personal repository where you would be able to pull them to your own server? Let me first consult with my supervisor and I will let you know. Since my samples are from quite a niche environment (a cursory glance at the annotated genomes already shows that the most abundant taxa are consistent with the taxonomic annotations I obtained from the MG-RAST pipeline), they may not need annotated using GTDB-Tk. However, I would be curious to see if there is a notable difference if I use GTDB-Tk vs checkm.

Thanks very much for your help!

dscheah commented 4 years ago

Hi @SilasK I'm wondering if it would be possible to still run my genomes through gtdb-tk on your server? Would my previous suggestion work -- pushing the files on to a github repository so you can pull the files on to your server? Let me know what works best for you and which files (besides the genomes/genomes files) you would require?

Thank you very much!

SilasK commented 4 years ago

Yes, it should still work. I only need the fasta files in genomes/genomes.

dscheah commented 4 years ago

Thanks very much @SilasK. I've just added you as a collaborator to my repository with the MAG fasta files.

SilasK commented 4 years ago

It is running.

dscheah commented 4 years ago

@SilasK Thanks for the update!

MalbertRogers commented 4 years ago

Hello,

I'm getting a similar error and was wondering what could be the problem. According to this previous issue, it should be a problem with resources? I'm running the pipeline with 250 GB of memory + 4 cpu's. I'm assuming the amount of memory should be fine, should I then increase the number of cpu's? What would you recommend?

Thanks in advance!

Activating conda environment: /hpc/dla_mm/mrogers/metagenomics/atlas_db/conda_envs/6faa2f20
Traceback (most recent call last):
  File "/hpc/dla_mm/mrogers/Gian/atlas_run_3M/.snakemake/scripts/tmp8v3ztke6.root_tree.py", line 12, in <module>
    T.unroot()
  File "/hpc/dla_mm/mrogers/metagenomics/atlas_db/conda_envs/6faa2f20/lib/python3.6/site-packages/ete3/coretype/tree.py", line 1325, in unroot
    raise TreeError("Cannot unroot a tree with only two leaves")
ete3.coretype.tree.TreeError: 'Cannot unroot a tree with only two leaves'
[Fri Jul 31 03:02:48 2020]
Error in rule root_tree:
    jobid: 105
    output: genomes/tree/gtdbtk.bac120.nwk
    log: logs/genomes/tree/root_tree_gtdbtk.bac120.log (check log file(s) for error message)
    conda-env: /hpc/dla_mm/mrogers/metagenomics/atlas_db/conda_envs/6faa2f20
SilasK commented 4 years ago

Its not a problem of memory :-)

Id don’t think you will get an interesting tree. If you want to finish atlas try to add --keep-going as a command line.

raise TreeError("Cannot unroot a tree with only two leaves") ete3.coretype.tree.TreeError: 'Cannot unroot a tree with only two leaves’

Sofie8 commented 3 years ago

Hi,

I want to share another error I encountered today in rule root_tree:

[Thu Jan 14 07:10:19 2021]
localrule root_tree:
    input: genomes/tree/gtdbtk.ar122.unrooted.nwk
    output: genomes/tree/gtdbtk.ar122.nwk
    log: logs/genomes/tree/root_tree_gtdbtk.ar122.log
    jobid: 565
    wildcards: msa=gtdbtk.ar122
    resources: mem=150, time=5

Activating conda environment: /ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/conda_envs/7a8c6438
Traceback (most recent call last):
  File "/ddn1/vol1/site_scratch/leuven/314/vsc31426/NGS_oct/Resanat/.snakemake/scripts/tmpoi_avvlr.root_tree.py", line 13, in <module>
    T.set_outgroup(T.get_midpoint_outgroup())
  File "/ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/conda_envs/7a8c6438/lib/python3.6/site-packages/ete3/coretype/tree.py", line 1237, in set_outgroup
    outgroup = _translate_nodes(self, outgroup)
  File "/ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/conda_envs/7a8c6438/lib/python3.6/site-packages/ete3/coretype/tree.py", line 2482, in _translate_nodes
    raise TreeError("Invalid target node: "+str(n))
ete3.coretype.tree.TreeError: 'Invalid target node: None'
[Thu Jan 14 07:10:36 2021]
Error in rule root_tree:
    jobid: 565
    output: genomes/tree/gtdbtk.ar122.nwk
    log: logs/genomes/tree/root_tree_gtdbtk.ar122.log (check log file(s) for error message)
    conda-env: /ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/conda_envs/7a8c6438

I was reading here that a reason can be: "This is due to the fact that some trees cannot be midpoint rooted. There is a fix for this: https://bitbucket.org/caseywdunn/biolite/commits/784edc6d03dd"

I don't know is this is possible to fix, but for I am able to continue as I crossed the option gtdb taxonomy out in the config to continue the rest of the pipeline.

Sofie

SilasK commented 3 years ago

You can also just rename the input file as output file. Do you have archaeae?

But I should find abetter solution.