One of the files often used by different taxonomic classifier is the accession2taxid file, to match sequence accession number to TAXID.
I'm not yet super familiar with GTDB, so I might have missed it, but as far as I could see, GTDB only keep tracks of accessions at the genome level.
Having accessions at the sequence level is often needed for building taxonomic classifier databases, as well as the sequence accession to TAXID file.
This adds a script to do so.
Using the names.dmp file created with gtdb_to_taxdump.py, it goes through all GTDB genomes, retrieves each sequence accession number, and associates it with the corresponding TAXID through the genome accession.
One of the files often used by different taxonomic classifier is the
accession2taxid
file, to match sequence accession number to TAXID.I'm not yet super familiar with GTDB, so I might have missed it, but as far as I could see, GTDB only keep tracks of accessions at the genome level. Having accessions at the sequence level is often needed for building taxonomic classifier databases, as well as the sequence accession to TAXID file.
This adds a script to do so. Using the
names.dmp
file created withgtdb_to_taxdump.py
, it goes through all GTDB genomes, retrieves each sequence accession number, and associates it with the corresponding TAXID through the genome accession.