shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
357 stars 29 forks source link

Search equivalent names with name2taxid? #87

Closed alvanuffelen closed 3 months ago

alvanuffelen commented 9 months ago

Prerequisites

Describe your issue

echo "Saccharomyces boulardii" | taxonkit name2taxid gives no taxid output. Looking in the names.dmp, 'Saccharomyces boulardii' can be found under: 252598 | Saccharomyces boulardii | | equivalent name |.

Is there a reason why 'name2taxid' only searches for scientific names or synonyms? https://github.com/shenwei356/taxonkit/blob/571eda08f8554f5d98ee5c9ec2d607946807bd33/taxonkit/cmd/util-complex-data.go#L60

shenwei356 commented 9 months ago

Oh, I thought scientific names were sufficient, Then someone said synonym were of equal importance. Besides equivalent name, are other kinds of names widely used? Is there a need to support all of them?

$ cut -f 7 names.dmp  | csvtk freq -Ht -nr | csvtk pretty -Ht
scientific name       2533553
authority             694337 
synonym               252076 
type material         241458 
includes              78861  
equivalent name       58225  
genbank common name   30413  
common name           14663  
acronym               2118   
in-part               667    
blast name            230    
genbank acronym       25
shenwei356 commented 9 months ago

Just removed the restriction of name types.

$ memusg -t -s 'echo "Saccharomyces boulardii" | taxonkit name2taxid --verbose '
14:04:26.645 [INFO] parsing names file: /home/shenwei/.taxonkit/names.dmp
14:04:29.704 [INFO] 3895687 names parsed
Saccharomyces boulardii 252598

elapsed time: 3.187s
peak rss: 912.28 MB