nick-youngblut / gtdb_to_taxdump

Convert GTDB taxonomy to NCBI taxdump format
MIT License
65 stars 13 forks source link

Unclassified sequences TAXID #13

Closed maxibor closed 1 year ago

maxibor commented 2 years ago

In the NCBI taxonomy, there is on particulary useful TAXID: 12908 for unclassified sequences.

Would it be possible to preserve 12908 for unclassified sequences in gtdb_to_taxump or to add an extra (stable) TAXID for unclassified sequences ?

nick-youngblut commented 2 years ago

It might be best to just point potential users to https://github.com/shenwei356/gtdb-taxdump

shenwei356 commented 2 years ago

It's the first time I know 12908. GTDB may have no such "unclassified sequences".

maxibor commented 2 years ago

It's a nice to have TAXID entry to report sequences that couldn't be attributed to any reference :) Thanks @nick-youngblut and @shenwei356 , I'll have a look at https://github.com/shenwei356/gtdb-taxdump when TaxonKit v0.11.0 is released