leylabmpi / Struo2

Scalable creating/updating of metagenome profiling databases
MIT License
58 stars 8 forks source link

Combining fungi and viruses with GTDB #11

Open luhugerth opened 2 years ago

luhugerth commented 2 years ago

Hi,

I want to create a Kraken2 DB with GTDB data, since it's so much more curated and reliable than NCBI. However, I do need to be able to detect all domains of life, so I want to include NCBI's fungal, viral and human genomes that you can normally get with kraken2-build. The structure of the output is a bit different with these two approaches, though; Struo2 creates a folder per genome with data within that folder, while kraken2/NCBI just dumps the genomes into a common folder. Will this be a problem for building the DB? Should I make some sort of loop to stash each genome into its folder?

I'm also not sure how to deal with these hybrid taxonomy, but I suppose I could select the archaeal, viral and mammalian nodes from the NCBI taxdump and append these to GTDB's?

Thank you very much for your time and this very nice package!

nick-youngblut commented 2 years ago

Sorry, but currently Struo2 only supports Bacteria & Archaea, given that the GTDB only supports those domains. I'm willing to include directly support for eukaryotic genomes, but it's not clear how best to integrate in eukaryotic genes & taxonomy. See https://github.com/leylabmpi/Struo2/issues/7

nick-youngblut commented 2 years ago

https://github.com/nick-youngblut/gtdb_to_taxdump can potentially help creating a hybrid taxdump file. If I have some time, I'll create a script for making a hybrid GTDB (archaea + bacteria) + NCBI (eukaryote) taxdump.

zoey-rw commented 2 years ago

Hi! I was wondering if you have any updated advice for integrating fungal/viral taxonomy & genomes, now that Struo2 has switched over to the new taxdump pipeline.

Thanks, Zoey

nick-youngblut commented 2 years ago

@zoey-rw I believe that https://github.com/shenwei356/gtdb-taxdump is focused on the GTDB, which only includes bacteria and archaea, so it is still a challenge to integrate other taxa

jolespin commented 1 year ago

Is the taxdump essential for humann?

nick-youngblut commented 1 year ago

Is the taxdump essential for humann?

No, it's shouldn't be needed.