pirovc / ganon

ganon2 classifies genomic sequences against large sets of references efficiently, with integrated download and update of databases (refseq/genbank), taxonomic profiling (ncbi/gtdb), binning and hierarchical classification, customized reporting and more
https://pirovc.github.io/ganon/
MIT License
87 stars 13 forks source link

Mixing GTDB and NCBI taxonomy #227

Closed ohickl closed 1 year ago

ohickl commented 1 year ago

Hi, is it possible to mix taxonomy systems when using multiple databases? E.g. id be interested in using GenBank and the latest GTDB release.

Best

Oskar

pirovc commented 1 year ago

It will work (although not tested) but any read with matches on NCBI and GTDB databases will get the LCA to the root node, so not very useful in practice. Do you have any idea what would be your expected outcome?

ohickl commented 1 year ago

Thanks for the quick reply! I wanted to do some kingdom level sorting. But since the eukaryotes are massive and RefSeq only represents a very small (imo skewed) subset, i was planning to build subset databases from the GenBank eukaryotes that individually will fit in ~ 2 tb of memory. This i would want to pair with the latest 207 GTDB release for the prokaryotes, with which i had pretty good results (at least with kraken2). As there are some ways to translate GTDB into NCBI, this might be the way to go then? Did not try it tough.

It would be awesome if ganon could translate it automatically. Not sure how feasible that would be using multitax, by setting a preferred taxonomic system and if there are any pitfalls.

pirovc commented 1 year ago

Indeed, would be good to have and it's already a planned feature for multitax which will be ported for ganon report but not yet implemented. GTDB to NCBI conversion is quite straightforward. Alternatively, you could build the 207 GTDB genomes with the NCBI taxonomy, so you don't have to translate it at the end.

ohickl commented 1 year ago

I will give that a try. Thanks!

shenwei356 commented 1 year ago

This may help:

Merging GTDB and NCBI taxonomy

ohickl commented 1 year ago

Thanks for the tips @shenwei356!