Open luhugerth opened 2 years ago
Sorry, but currently Struo2 only supports Bacteria & Archaea, given that the GTDB only supports those domains. I'm willing to include directly support for eukaryotic genomes, but it's not clear how best to integrate in eukaryotic genes & taxonomy. See https://github.com/leylabmpi/Struo2/issues/7
https://github.com/nick-youngblut/gtdb_to_taxdump can potentially help creating a hybrid taxdump file. If I have some time, I'll create a script for making a hybrid GTDB (archaea + bacteria) + NCBI (eukaryote) taxdump.
Hi! I was wondering if you have any updated advice for integrating fungal/viral taxonomy & genomes, now that Struo2 has switched over to the new taxdump pipeline.
Thanks, Zoey
@zoey-rw I believe that https://github.com/shenwei356/gtdb-taxdump is focused on the GTDB, which only includes bacteria and archaea, so it is still a challenge to integrate other taxa
Is the taxdump essential for humann?
Is the taxdump essential for humann?
No, it's shouldn't be needed.
Hi,
I want to create a Kraken2 DB with GTDB data, since it's so much more curated and reliable than NCBI. However, I do need to be able to detect all domains of life, so I want to include NCBI's fungal, viral and human genomes that you can normally get with
kraken2-build
. The structure of the output is a bit different with these two approaches, though; Struo2 creates a folder per genome with data within that folder, while kraken2/NCBI just dumps the genomes into a common folder. Will this be a problem for building the DB? Should I make some sort of loop to stash each genome into its folder?I'm also not sure how to deal with these hybrid taxonomy, but I suppose I could select the archaeal, viral and mammalian nodes from the NCBI taxdump and append these to GTDB's?
Thank you very much for your time and this very nice package!