greatfireball / NCBI-Taxonomy

MIT License
4 stars 1 forks source link

Get taxid for scientific name #3

Open iimog opened 9 years ago

iimog commented 9 years ago

It would be nice to have a way of mapping scientific names to taxids. Like the service provided at http://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi but programmatically and locally. Optionally the user should be able to specify the ranks to search for the scientific name.

greatfireball commented 9 years ago

Is it required to search for more than the scientific name, e.g. synonyms?

iimog commented 9 years ago

Yes, that would be nice. Also "common name", "blast name" and even "in-part" should somehow be handled. A way to know the primary scientific name and the ambiguity would be nice, too.

For example: Haptophyceae (http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=2830&mode=Info) have a number of common names, a blast name and in-part entries:

Haptophyceae
Taxonomy ID: 2830
Inherited blast name: haptophytes
Rank: no rank
Genetic code: Translation table 1 (Standard)
Mitochondrial genetic code: Translation table 1 (Standard)
Other names:
synonym:    Prymnesiophyta
synonym:    Prymnesiophyceae
synonym:    Haptophyta
common name:    prymnesiophytes
common name:    coccolithophorids
in-part:    algae
in-part:    Chromophyta
blast name:     haptophytes

Lineage( full )
    cellular organisms; Eukaryota 

The tax_identifier service of NCBI (see link from original question) uses a code which could be adopted:

Explanation of status codes for names
1 - the incoming name is our primary name for a taxon in our database
2 - the incoming name is a secondary name for a taxon in our database
(it could be listed as a synonym, a misspelling, a common name,
or several other nametypes)
3 - the incoming name is not found in our database
+ - the incoming name is duplicated in our database
(used in combination with the other status codes)

The result of a search with all the names of Haptophyceae via tax_identifier service results in the following table:

code    name            preferred name      taxid
1       Haptophyceae                        2830
2       Prymnesiophyta  Haptophyceae        2830
2       Prymnesiophyceae    Haptophyceae        2830
2       Haptophyta       Haptophyceae       2830
2       prymnesiophytes     Haptophyceae        2830
2       coccolithophorids   Haptophyceae        2830
2+      algae           Chrysophyceae       2825
2+      algae           Phaeophyceae        2870
2+      algae           Haptophyceae        2830
2+      algae           Xanthophyceae       2833
2+      algae           Bacillariophyta         2836
2+      algae           Dinophyceae             2864
2+      algae           Rhodophyta          2763
2+      algae           Chlorophyta             3041
2+      algae           Euglenida           3035
2+      algae           Cryptophyta             3027
2+      algae           Eustigmatophyceae   5747
2+      algae           Chlorarachniophyceae    29197
2+      algae           Raphidophyceae      38410
2+      algae           Glaucocystophyceae  38254
2+      algae           Dictyochophyceae        39119
2+      algae           Phaeothamniophyceae     82162
2+      algae           Mesostigmatophyceae     96475
2+      algae           Klebsormidiophyceae     131220
2+      algae           Chlorokybophyceae   131213
2+      algae           Coleochaetophyceae  304573
2+      algae           Charophyceae        304574
2+      Chromophyta      Haptophyceae       2830
2+      Chromophyta      Dinophyceae            2864
2+      Chromophyta      Cryptophyta            3027
2+      Chromophyta      Stramenopiles          33634
2       haptophytes      Haptophyceae       2830
alexgarciac commented 6 years ago

@iimog did u get what u were looking for?

iimog commented 6 years ago

Hi @alexgarciac for my project at the time I was using the tax_identifier service from NCBI. Not sure whether this functionality has since been implemented in this module. Maybe @greatfireball can comment?

greatfireball commented 6 years ago

Hey @alexgarciac and @iimog, currently I am working on a new release and hope to finish it by the end of this week. It will be compatible with the older interface and add some features as the mapping from taxids to scientific names, but will require a new database. My plan is to provide that database on a weekly basis as soon as the NCBI taxonomy files are updated. Of course, one can create his/her own database on a local machine. I will post here in that issue, as soon as the new version is available. Hope that will help both of you

alexgarciac commented 6 years ago

@greatfireball thanks. will keep an eye on the tool

alexgarciac commented 6 years ago

@greatfireball please let me know when the new tool is done i would love to try it

alexgarciac commented 6 years ago

@greatfireball please let me know when the new tool is done i would love to try it

greatfireball commented 6 years ago

@alexgarciac unfortunately, this feature need some more testing... Hope that I can provide it next week, but I will give you a short notification via this issue. Todays release will fix the gi/accession issue #5