zyxue / ncbitax2lin

🐞 Convert NCBI taxonomy dump into lineages
MIT License
138 stars 29 forks source link

Arrange lineages according to taxonomic hierarchy #20

Closed josuebarrera closed 2 years ago

josuebarrera commented 2 years ago

Dear zyxue, I really like the software you created, I find it really useful to perform comparative analyses between local BLAST searches and the NCBI Taxonomy. I wanted to compare different organisms (let's say Drosophila melanogaster and Arabidopsis thaliana) at each taxonomic rank. Obviously, doing this is not straightforward, as different species have a different amount of taxonomic levels in the NCBI (e.g., Drosophila melanogaster has 34 taxonomic levels ranging from the species-level to "cellular organisms", while Arabidopsis thaliana has only 21 levels). I figured that I could manually arrange the entire "ncbi_lineages" table according to the species with the highest number of taxonomic levels, which would also arrange the other species taxonomic hierarchies, and later removing the blank spaces for the species with the missing levels. Then I noticed that the "no rank#" and "clade#" columns do not correspond to the same taxonomic level in different organisms (e.g., the "clade" column corresponds to "Opisthokonta" in Drosophila, but corresponds to "Embryophyta" in Arabidopsis, which are not taxonomically equivalent). Is there a way for ncbitax2lin to arrange these problematic columns in their correct hierarchical level? Or for the table to be arranged according to the correct taxonomic hierarchy of an specific species without having to do it manually for each species of interest? Best regards, Josué.

zyxue commented 2 years ago

Hi Josué, glad to hear you like it. But nope. ncbitax2lin doesn't change any of the ranks assigned by NCBI. ncbitax2lin just transforms the data into a form more convenient for reading lineages.