nick-youngblut / gtdb_to_taxdump

Convert GTDB taxonomy to NCBI taxdump format
MIT License
65 stars 13 forks source link

gtdb_to_taxdump Upload Python Package PyPI version

gtdb_to_taxdump

Convert GTDB taxonomy to NCBI taxdump format.

NOTE: the taxIDs are NOT stable between releases! See gtdb-taxdump for an alternative that uses stable taxIDs.

Table of Contents

Summary

Convert GTDB taxonomy to NCBI taxdump format in order to use the GTDB taxonomy with software that requires a taxonomy in the taxdump format (eg., kraken2 or TaxonKit).

Note that the taxIDs are arbitrarily assigned and don't match anything in the NCBI! Running gtdb_to_taxdump on a different list of taxonomies (e.g., a different GTDB release) will create different taxIDs. See GTDB-taxdump for a method to produce stable taxIDs (recommended!).

WARNING

There was a serious bug with ncbi-gtdb_map.py prior to version 0.1.5. Many of the taxonomic classifications are likely incorrect. Please re-run the analysis. I'm sorry for any inconvenience.

Citation

DOI

Install

Dependencies

Package

From pypi

pip install gtdb_to_taxdump

From github

pip install git+https://github.com/nick-youngblut/gtdb_to_taxdump.git

Usage

See gtdb_to_taxdump.py -h

Example (GTDB release202):

gtdb_to_taxdump.py \
  https://data.gtdb.ecogenomic.org/releases/release202/202.0/ar122_taxonomy_r202.tsv.gz \
  https://data.gtdb.ecogenomic.org/releases/release202/202.0/bac120_taxonomy_r202.tsv.gz \
  > taxID_info.tsv

Example (GTDB release95):

gtdb_to_taxdump.py \
  https://data.gtdb.ecogenomic.org/releases/release95/95.0/ar122_taxonomy_r95.tsv.gz \
  https://data.gtdb.ecogenomic.org/releases/release95/95.0/bac120_taxonomy_r95.tsv.gz \
  > taxID_info.tsv

Example (GTDB release89):

gtdb_to_taxdump.py \
  https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/ar122_taxonomy_r89.tsv \
  https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/bac120_taxonomy_r89.tsv \
  > taxID_info.tsv

You can add the taxIDs to a GTDB metadata table via the --table param. For example:

wget https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/ar122_metadata_r89.tsv
gtdb_to_taxdump.py \
  --table ar122_metadata_r89.tsv \
  https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/ar122_taxonomy_r89.tsv \
  https://data.ace.uq.edu.au/public/gtdb/data/releases/release89/89.0/bac120_taxonomy_r89.tsv \
  > taxID_info.tsv

Extras

GTDB website

https://data.ace.uq.edu.au/public/gtdb/data/releases/