shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
361 stars 29 forks source link

obsolete taxid check #10

Closed msaavedrat closed 5 years ago

msaavedrat commented 6 years ago

Hi :), A little problem appears when the taxid gets "obsolet". However that info (and the new taxid), can be fetched from merged.dmp I suggest you adding a little check in merged file, in case of no taxid be found? (Or search in merged first, something like that...). Example: taxid 92489 was replaced with 796334 (Erwinia oleae). I tested it with list function. Thanks

shenwei356 commented 6 years ago

Sorry for the late reply. What command did you use for this case? what's the purpose?

msaavedrat commented 5 years ago

Ok, It's not properly a bug, but a suggestion that could be a useful command/upgrade to check the id (if was modified recently, merged or deleted) in case of taxid was obtained from "old (not updated)" data, text, etc. For example: taxonkit list --ids 92489 [result nothing.. So i must check that id]; when i search that id with web NCBI taxonomy, is redirected to the updated 796334 taxid and organism data (well expected). And of course, the command: taxonkit list --ids 796334 [it works perfectly] That is because 92489 now is assigned to 796334 id (in merged.dmp).

lskatz commented 5 years ago

It might make sense to have a function in taxonkit called "clean" where you identify or remove nodes with no parent.

shenwei356 commented 5 years ago

I'll fix this situation.

shenwei356 commented 5 years ago

@msaavedrat @lskatz we check deleted and merged taxids now. https://github.com/shenwei356/taxonkit/issues/19

shenwei356 commented 5 years ago

I think v0.5.0 fixes this, can I close this issue now?