Closed dscheah closed 3 years ago
I fear not, https://github.com/Ecogenomics/GTDBTk#hardware-requirements
If you want, you can send me a griped version of your genomes/genomes and I ran gtdb-tk on my server.
Hi @SilasK thanks for the offer. In your experience, do you think it would be beneficial to annotate with the GTDB database instead of checkm? Based on my understanding, checkm serves as more as an quality assessor of metagenome assembled genomes, rather than a taxonomic assignment toolkit like GTDB-Tk. Obviously with checkm I was able to run the pipeline with my limited memory resource, but I am wondering if GTDB-Tk would typically give greater depth/accuracy to the assignments. However, I assume that it would depend on the microbial diversity of the samples in question...
Thanks!
Yes, GTDB gives more up to date and more in depth taxonomic annotation.
I have heard about kbase -- unfortunately I am the getting close to the business end of my PhD and don't have enough time to wrangle a new pipeline. If you are able annotate my MAGs on your server using GTDB-Tk, I think I will push my files on to a personal repository where you would be able to pull them to your own server? Let me first consult with my supervisor and I will let you know. Since my samples are from quite a niche environment (a cursory glance at the annotated genomes already shows that the most abundant taxa are consistent with the taxonomic annotations I obtained from the MG-RAST pipeline), they may not need annotated using GTDB-Tk. However, I would be curious to see if there is a notable difference if I use GTDB-Tk vs checkm.
Thanks very much for your help!
Hi @SilasK I'm wondering if it would be possible to still run my genomes through gtdb-tk on your server? Would my previous suggestion work -- pushing the files on to a github repository so you can pull the files on to your server? Let me know what works best for you and which files (besides the genomes/genomes files) you would require?
Thank you very much!
Yes, it should still work. I only need the fasta files in genomes/genomes.
Thanks very much @SilasK. I've just added you as a collaborator to my repository with the MAG fasta files.
It is running.
@SilasK Thanks for the update!
Hello,
I'm getting a similar error and was wondering what could be the problem. According to this previous issue, it should be a problem with resources? I'm running the pipeline with 250 GB of memory + 4 cpu's. I'm assuming the amount of memory should be fine, should I then increase the number of cpu's? What would you recommend?
Thanks in advance!
Activating conda environment: /hpc/dla_mm/mrogers/metagenomics/atlas_db/conda_envs/6faa2f20
Traceback (most recent call last):
File "/hpc/dla_mm/mrogers/Gian/atlas_run_3M/.snakemake/scripts/tmp8v3ztke6.root_tree.py", line 12, in <module>
T.unroot()
File "/hpc/dla_mm/mrogers/metagenomics/atlas_db/conda_envs/6faa2f20/lib/python3.6/site-packages/ete3/coretype/tree.py", line 1325, in unroot
raise TreeError("Cannot unroot a tree with only two leaves")
ete3.coretype.tree.TreeError: 'Cannot unroot a tree with only two leaves'
[Fri Jul 31 03:02:48 2020]
Error in rule root_tree:
jobid: 105
output: genomes/tree/gtdbtk.bac120.nwk
log: logs/genomes/tree/root_tree_gtdbtk.bac120.log (check log file(s) for error message)
conda-env: /hpc/dla_mm/mrogers/metagenomics/atlas_db/conda_envs/6faa2f20
Its not a problem of memory :-)
Id don’t think you will get an interesting tree.
If you want to finish atlas try to add --keep-going
as a command line.
raise TreeError("Cannot unroot a tree with only two leaves") ete3.coretype.tree.TreeError: 'Cannot unroot a tree with only two leaves’
Hi,
I want to share another error I encountered today in rule root_tree:
[Thu Jan 14 07:10:19 2021]
localrule root_tree:
input: genomes/tree/gtdbtk.ar122.unrooted.nwk
output: genomes/tree/gtdbtk.ar122.nwk
log: logs/genomes/tree/root_tree_gtdbtk.ar122.log
jobid: 565
wildcards: msa=gtdbtk.ar122
resources: mem=150, time=5
Activating conda environment: /ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/conda_envs/7a8c6438
Traceback (most recent call last):
File "/ddn1/vol1/site_scratch/leuven/314/vsc31426/NGS_oct/Resanat/.snakemake/scripts/tmpoi_avvlr.root_tree.py", line 13, in <module>
T.set_outgroup(T.get_midpoint_outgroup())
File "/ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/conda_envs/7a8c6438/lib/python3.6/site-packages/ete3/coretype/tree.py", line 1237, in set_outgroup
outgroup = _translate_nodes(self, outgroup)
File "/ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/conda_envs/7a8c6438/lib/python3.6/site-packages/ete3/coretype/tree.py", line 2482, in _translate_nodes
raise TreeError("Invalid target node: "+str(n))
ete3.coretype.tree.TreeError: 'Invalid target node: None'
[Thu Jan 14 07:10:36 2021]
Error in rule root_tree:
jobid: 565
output: genomes/tree/gtdbtk.ar122.nwk
log: logs/genomes/tree/root_tree_gtdbtk.ar122.log (check log file(s) for error message)
conda-env: /ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/conda_envs/7a8c6438
I was reading here that a reason can be: "This is due to the fact that some trees cannot be midpoint rooted. There is a fix for this: https://bitbucket.org/caseywdunn/biolite/commits/784edc6d03dd"
I don't know is this is possible to fix, but for I am able to continue as I crossed the option gtdb taxonomy out in the config to continue the rest of the pipeline.
Sofie
You can also just rename the input file as output file. Do you have archaeae?
But I should find abetter solution.
Hello! I managed to get the full Atlas pipeline to work by restricting resource memory, but I was curious if I could use gtdb markers for taxonomic annotation using the same restricted memory availability.
I amended the config file accordingly:
Eventually I received the following error messages:
I am wondering if this is a straightforward fix or another memory issue? Otherwise I am fine with only using the checkm markers.
Thank you very much!