ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
267 stars 60 forks source link

`gnr_resolve` not matching the same name multiple times OR matches erroneously #920

Open ErikKusch opened 10 months ago

ErikKusch commented 10 months ago

The Issue

Using the function gnr_resolve(), I never obtain the same matched name for multiple user-supplied names - even when doing so would lead to a clearly better match. These erroneous matches persist even in single-species gnr_resolve()queries.

Minimal Working Example

Running this code:

library(taxize)
sps <- c("Lagopus matu", "Logopus muta", "Lagopus lagopus", "Lagopus muta", "Lagopas lagopus")
GNR_df <- gnr_resolve(sci = sps, best_match_only = TRUE)
GNR_df

results in this output:

# A tibble: 5 × 5
  user_supplied_name submitted_name  matched_name              data_source_title score
* <chr>              <chr>           <chr>                     <chr>             <dbl>
1 Lagopus matu       Lagopus matu    Lagopus Brisson, 1760     Catalogue of Lif… 0.75 
2 Logopus muta       Logopus muta    Lagopus muta (Montin, 17… Catalogue of Lif… 0.75 
3 Lagopus lagopus    Lagopus lagopus Lagopus lagopus           Wikispecies       0.988
4 Lagopus muta       Lagopus muta    Lagopus muta              Wikispecies       0.988
5 Lagopas lagopus    Lagopas lagopus Lagopus lagopus (Linnaeu… Catalogue of Lif… 0.75 

Evidently, the best match for Lagopus matu (first row in the output) should be Lagopus muta as has been matched correctly in row four. Additionally, the matches to Lagopus lagopus (row 3) and Lagopas lagopus (row 5) ought to be the same - Lagopus lagopus.

Interestingly, even when running the gnr_resolve()function only on just the first species:

gnr_resolve(sci = sps[1], best_match_only = TRUE)

still results in the same erroneous match as above:

# A tibble: 1 × 5
  user_supplied_name submitted_name matched_name          data_source_title      score
* <chr>              <chr>          <chr>                 <chr>                  <dbl>
1 Lagopus matu       Lagopus matu   Lagopus Brisson, 1760 Catalogue of Life Che…  0.75

Workaround

For now, I have put together a workaround with the rgbif package:

library(rgbif)
Fixed_Species <- sapply(sps, # loop over species names
    FUN = function(x){
        gbif_resolve <- rgbif::name_backbone_verbose(x) # retrieve gbif backbone matches
        ifelse(gbif_resolve$data$matchType != "NONE", 
               gbif_resolve$data$canonicalName[1], # if match has been made, then pull matched canonical name
               gbif_resolve$alternatives$canonicalName # if no match, then pull out alternative matches from fuzzy matching
              )
    }
)

which, to me, leads to the expected matches:

    Lagopus matu      Logopus muta   Lagopus lagopus      Lagopus muta   Lagopas lagopus 
   "Lagopus muta"    "Lagopus muta" "Lagopus lagopus"    "Lagopus muta" "Lagopus lagopus" 
Session Info ```r R version 4.3.2 (2023-10-31) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.1 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: Europe/Oslo tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] taxize_0.9.100 loaded via a namespace (and not attached): [1] bold_1.3.0 gtable_0.3.4 jsonlite_1.8.7 crayon_1.5.2 [5] rgbif_3.7.7 dplyr_1.1.2 compiler_4.3.2 tidyselect_1.2.0 [9] Rcpp_1.0.11 xml2_1.3.4 stringr_1.5.0 parallel_4.3.2 [13] scales_1.2.1 uuid_1.1-1 lattice_0.21-9 ggplot2_3.4.3 [17] R6_2.5.1 plyr_1.8.8 generics_0.1.3 curl_5.0.2 [21] oai_0.4.0 iterators_1.0.14 tibble_3.2.1 crul_1.4.0 [25] munsell_0.5.0 pillar_1.9.0 rlang_1.1.1 utf8_1.2.3 [29] httpcode_0.3.0 stringi_1.7.12 lazyeval_0.2.2 cli_3.6.1 [33] magrittr_2.0.3 foreach_1.5.2 digest_0.6.31 grid_4.3.2 [37] rstudioapi_0.15.0 lifecycle_1.0.3 nlme_3.1-163 vctrs_0.6.3 [41] glue_1.6.2 data.table_1.14.8 whisker_0.4.1 zoo_1.8-12 [45] codetools_0.2-19 ape_5.7-1 fansi_1.0.4 colorspace_2.1-0 [49] conditionz_0.1.0 httr_1.4.7 tools_4.3.2 pkgconfig_2.0.3 ```