ropensci / bold

Interface to the Bold Systems barcode webservice
https://docs.ropensci.org/bold
Other
17 stars 11 forks source link

Discrepancies between Taxonomy API and Public API #85

Open salix-d opened 2 years ago

salix-d commented 2 years ago

Availability of taxa

Some species can't be found by name or id using the Taxonomy API (bold_tax_name()|bold_tax_id()) but still have public records ("Moraea elsiae", for example). bold_stats(ids = "GBVC3127-11") returns the species name.

I contacted their support to inform them.

I'm currently trying to build a database of all their public taxonomy with their taxid and all their name's variation. Goal being that user will be able to confirm that their taxon exist in their database and also to facilitate getting downstream lineage.

Once done, I'll try to make it in a way that can be automate so it can be updated when their database is.

salix-d commented 2 years ago

Classification of taxa

bold::bold_tax_id(48327)
#  input taxid      taxon tax_rank tax_division parentid   taxonrep
# 1 48327 48327 Rhodophyta   phylum     Protista        1 Rhodophyta

Their Taxonomy API says that Rhodophyta is from Protista even though on their Taxonomy page it's listed under Plants...

salix-d commented 2 years ago

Names of taxa

When looking up "Suaeda sp. 'Socotra'", the taxon name returned is "Suaeda sp. 'Socotra". However, if we try to get the records using that names, they aren't found; we need to add back the closing quote.

> bold::bold_tax_name("Suaeda sp. \\'Socotra\\'")
#     taxid               taxon tax_rank tax_division parentid parentname specimenrecords                    input
# 1 1082786 Suaeda sp. 'Socotra  species      Plantae   156339     Suaeda               1 Suaeda sp. \\'Socotra\\'
> bold::bold_specimens("Suaeda sp. \\'Socotra")
# Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  no lines available in input
> bold::bold_specimens("Suaeda sp. \\'Socotra\\'")
#      processid sampleid recordID catalognum fieldnum      institution_storing collection_code      bin_uri phylum_taxID   phylum_name
# 1 GBCMD0539-06 AY803585   491573         NA          Mined from GenBank, NCBI              NA BOLD:AAJ3826           20    Arthropoda
# 2  GBVE4209-11 AY514841  2288624         NA AY514841 Mined from GenBank, NCBI              NA                        12 Magnoliophyta

This also happens with names ending with a dot or a parenthesis (possibly other non-alphanumeric character I haven't seen yet)

This usually isn't a problem when looking for records using higher taxonomic ranks, which I think is the most common way to look for records, but might want to have a check for species names/have a function to make sure species names are valid before looking for records?