ropensci / taxadb

:package: Taxonomic Database
https://docs.ropensci.org/taxadb
Other
43 stars 13 forks source link

Reproducing Table 6 in Norman et al. (MEE 2020) #116

Closed gregor-fausto closed 1 year ago

gregor-fausto commented 1 year ago

I'm excited about finding this package, it seems like an excellent tool! I'm just starting to use it and looking at examples in the paper (Norman et al. 2020) and tutorial focused on the Taxonomic Data Tables. I can use filter_name as described in the tutorial (https://docs.ropensci.org/taxadb/articles/intro.html#taxonomic-data-tables). When I try to use the function to access the taxonomic table for Abies menziesii (which corresponds to Table 6, I think), I get an empty tibble. Do you have thoughts about why I might not be getting the same output?

Here's what I'm doing:

td_create("col")
filter_name("Abies menziesii", provider = "col")
cboettig commented 1 year ago

@gregor-fausto Thanks for this!

It looks like the genus field is missing for synonyms in the most recent COL snapshot:

library(taxadb)
library(dplyr)
col <- taxa_tbl("col")
col |> filter(specificEpithet == "menziesii") |> select(taxonID, acceptedNameUsageID, taxonomicStatus, scientificName, genus, specificEpithet)

I'll see if I can track that down and patch the snapshot. meanwhile I think you should be ok for names that are 'accepted'.

Meanwhile, note that since the time of publication, COL decided to alter all of it's taxonomic identifiers, so you will not reproduce the taxon IDs shown in the paper with current versions of the database. Fortunately, since 2021 COL now uses stable identifiers, please see: https://www.catalogueoflife.org/2021/04/14/stable-ids

gregor-fausto commented 1 year ago

Thanks for your quick response. That's helpful and I'll see if I can work through some of the other cases with this in mind!

cboettig commented 1 year ago

@gregor-fausto Thanks again for reporting this. I think I have now amended the COL table so this should once again resolve. Please restart R and try your code again, that should trigger a re-import of the ammended COL data.

I think you will see this name now resolves to three unique/different accepted names, making it a rather ambiguous synonym -- such is the reality of taxonomic names!

  > filter_name("Abies menziesii", provider = "col") |> select(taxonID, acceptedNameUsageID, scientificName)
# A tibble: 3 × 3
  taxonID   acceptedNameUsageID scientificName 
  <chr>     <chr>               <chr>          
1 COL:63Z7M COL:4HQ4J           Abies menziesii
2 COL:63YVM COL:4HQ4Z           Abies menziesii
3 COL:63YVL COL:5QZBQ           Abies menziesii

The issue turned out to be that COL had revised the format of it's upstream database, which doesn't list parentNameUsageID (parent taxon names) for anything that is not an accepted name, hence genus was being lost on synonyms. Instead, COL provided an additional field genericName for the genus (including the genus of synonyms), as well as specificEpithet, so using that I was able to recover the genus names for synonyms, which should now resolve as shown above.

Please re-open if you still have issues with this, and as always, bug reports much appreciated!

gregor-fausto commented 1 year ago

Thanks for following up. I'll admit I don't fully understand the details of the taxonomy schema but the package is now doing what you described! I appreciate you looking into the issue.