shenwei356 / gtdb-taxdump

GTDB taxonomy taxdump files with trackable TaxIds
MIT License
46 stars 2 forks source link

Some ranks are being skipped #3

Closed apcamargo closed 2 years ago

apcamargo commented 2 years ago

I didn't do any systematic evaluation, but I found at least one lineage that skips the family level:

taxon = taxopy.Taxon(619996715, taxdb)
print(taxon)
# s__Bacteria;p__Patescibacteria;c__Microgenomatia;o__2-02-FULL-39-11;g__2-02-FULL-40-10;s__2-02-FULL-40-10 sp001779115
$ taxonkit lineage -R --data-dir R207 problematic_taxids.txt
619996715   Bacteria;Patescibacteria;Microgenomatia;2-02-FULL-39-11;2-02-FULL-40-10;2-02-FULL-40-10 sp001779115 superkingdom;phylum;class;order;genus;species
apcamargo commented 2 years ago

This seems to be the case of ranks that have the same name, as described in the help dialogue. I hadn't seen the explanation there.

apcamargo commented 1 year ago

@shenwei356 do you think it could be useful to add prefixes (e.g. p__, c__, etc) to the taxa names to make sure all the ranks will have anode?

shenwei356 commented 1 year ago

No need I think. It's common in viral species.