Convert GTDB taxonomy to NCBI taxdump format.
NOTE: the taxIDs are NOT stable between releases! See gtdb-taxdump for an alternative that uses stable taxIDs.
Convert GTDB taxonomy to NCBI taxdump format in order to use the GTDB taxonomy with software that requires a taxonomy in the taxdump format (eg., kraken2 or TaxonKit).
Note that the taxIDs are arbitrarily assigned and don't match anything in the NCBI! Running
gtdb_to_taxdump
on a different list of taxonomies (e.g., a different GTDB release) will create different taxIDs. See GTDB-taxdump for a method to produce stable taxIDs (recommended!).
There was a serious bug with
ncbi-gtdb_map.py
prior to version 0.1.5. Many of the taxonomic classifications are likely incorrect. Please re-run the analysis. I'm sorry for any inconvenience.
pip install gtdb_to_taxdump
pip install git+https://github.com/nick-youngblut/gtdb_to_taxdump.git
See gtdb_to_taxdump.py -h
Example (GTDB release202):
gtdb_to_taxdump.py \
https://data.gtdb.ecogenomic.org/releases/release202/202.0/ar122_taxonomy_r202.tsv.gz \
https://data.gtdb.ecogenomic.org/releases/release202/202.0/bac120_taxonomy_r202.tsv.gz \
> taxID_info.tsv
Example (GTDB release95):
gtdb_to_taxdump.py \
https://data.gtdb.ecogenomic.org/releases/release95/95.0/ar122_taxonomy_r95.tsv.gz \
https://data.gtdb.ecogenomic.org/releases/release95/95.0/bac120_taxonomy_r95.tsv.gz \
> taxID_info.tsv
Example (GTDB release89):
gtdb_to_taxdump.py \
https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/ar122_taxonomy_r89.tsv \
https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/bac120_taxonomy_r89.tsv \
> taxID_info.tsv
You can add the taxIDs to a GTDB metadata table via the --table
param. For example:
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/ar122_metadata_r89.tsv
gtdb_to_taxdump.py \
--table ar122_metadata_r89.tsv \
https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/ar122_taxonomy_r89.tsv \
https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/bac120_taxonomy_r89.tsv \
> taxID_info.tsv
ncbi-gtdb_map.py
NCBI => GTDB
or GTDB => NCBI
(see --query-taxonomy
)gtdb_to_diamond.py
lineage2taxid.py
gtdb_to_taxdump.py
acc2gtdb_tax.py
./uniref_utils/
unirefxml2clust50-90idx.py
unirefxml2fasta.py
unirefxml2tax.py