ropensci / taxizedb

Tools for Working with Taxonomic SQL Databases
Other
30 stars 7 forks source link

missing species names for name2taxid #70

Closed hh1985 closed 11 months ago

hh1985 commented 11 months ago

name2taxid doesn't work for species like Bacteroides dorei, Bacteroides vulgatus, Clostridium sp. ASF356. I found search results in Taxonomy browser in NCBI, but with different names.

stitam commented 11 months ago

Thanks @hh1985 for opening this issue. This is because NCBI Taxonomy browser will suggest alternatives if it cannot find an exact match, but this functionality is not implemented in taxizedb. If you try the taxon names that are suggested by NCBI Taxonomy browser, you will get the appropriate taxid:

Searching for Bacteriodes dorei did not give exact much but browser suggested Phocaeicola dorei which works:

taxizedb::name2taxid("Phocaeicola dorei")
#> [1] "357276"

Created on 2023-09-19 with reprex v2.0.2

I am not sure what's going on here but I'm guessing that at some point Bacteriodes dorei was reclassified under the Phocaeicola genus. This historical information was somehow preserved and is being queried by NCBI Taxonomy browser.

Do you have many species which taxizedb cannot find?

hh1985 commented 11 months ago

@stitam I checked GBIF for the terms, e.g. https://www.gbif.org/species/167052087. Interestingly, it reminds Bacteroides vulgatus as synonym of Phocaeicola vulgatus CL09T03C04. I will check tools that integrating GBIF taxonomy and support fuzzy match.

I do have cases where the names used in literature were not the ones used in NCBI. This frequently brake my workflow.

mkhemmani commented 11 months ago

maybe this is on topic, but if I do classification("Morganella morganii"), it will return an answer and genus Morganella will have a taxid of 581. However, if I do classification("Morganella"), it will break. The weird thing is if I do classification("581") I will get an answer to what I would expect if I did classification("Morganella").

stitam commented 11 months ago

Thanks @mkhemmani, this one is more related to issue #57: while taxizedb::classification() works with taxon names, it was originally designed to work with taxids. You should first run taxizedb::name2taxid() to get the taxids that will work for your taxon and then use these taxids with taxizedb::classification() to get the desired classification.