Closed azizka closed 1 year ago
Thanks, Alex! I think I know what’s going on - there’s a filter before the fuzzy matching that removes names with the same ID as ones that have already been matched from the unmatched names list: https://github.com/matildabrown/rWCVP/blob/da4fabb3d5201cc4ca31b544aec1a4a02047e5b5/R/wcvp_match_names.R#L148
Did you pass in a ID column name to wcvp_name_match
e.g. wcvp_name_match(names, name_col=“spnames”, id_col=“spid”)
?
ah. No, no species ID:
wcvp_match_names(names_df = name_li, name_col = "scrubbed_species_binomial", author_col = "scrubbed_author")
Ah, looks like it must be a bug then.
Don’t suppose you could share the input data?
yes, no problem. names_list_matching_WCVP.csv
R version 4.2.3 (2023-03-15 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
[5] LC_TIME=German_Germany.utf8
attached base packages: [1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] janitor_2.2.0 rWCVP_1.2.4 CoordinateCleaner_2.0-20 sf_1.0-11 readxl_1.4.2 countrycode_1.4.0
[7] BIEN_1.2.6 RPostgreSQL_0.7-5 DBI_1.1.3 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[13] dplyr_1.1.0 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.1
[19] tidyverse_2.0.0
Sorry it's taken so long to get back on this - the problem was entirely with the CLI summary of the matching, which wasn't taking into account the author_col
you provided. So all the matching was fine but the CLI counted things wrong.
I've made a pull request with the fix now.
Great package! When provided with a list of names (538 in this case) the function report only matching part of the names (511). Why? The outputdata seems complete. I can provide the species list if necessary.
Using the
scrubbed_species_binomial
column── Exact matching 538 names ──
✔ Found 508 of 538 names
── Fuzzy matching 3 names ──
✔ Found 3 of 3 names
── Matching complete! ──
✔ Matched 511 of 511 names
ℹ Exact (with author): 392
ℹ Exact (without author): 116
ℹ Fuzzy (edit distance): 2
ℹ Fuzzy (phonetic): 1
! Names with multiple matches: 7