matildabrown / rWCVP

Generating Summaries, Reports and Plots from the World Checklist of Vascular Plants
https://matildabrown.github.io/rWCVP/
GNU General Public License v3.0
19 stars 0 forks source link

`bind_rows()` error using `wcvp_match_names()` #44

Closed nlkinlock closed 1 year ago

nlkinlock commented 1 year ago

I've come across an error while using wcvp_match_names() that looks like it originates from dplyr::bind_rows(). Below is an example with some plant names that should trigger the error. Many thanks in advance for your help.

library(rWCVP)
df <- data.frame(TaxonName = c("Artemisia vulgaris", "Salix sitchensis", "Elytrigia caespitosa", "Artemisia vulgaris", "Potentilla glandulosa", "Malus coronaria", "Dioscorea esculenta", "Brickellia veronicifolia"), 
                 Authority = c("L.", "Sanson ex Bong.", "(K.Koch) Nevski", "L.", "Lindl.", "(L.) Mill.", "(Lour.) Burkill", "(Kunth) A.Gray"), 
                 TaxonID = c(35, 333, 124, 34, 303, 231, 112, 55))
wcvp.out <- wcvp_match_names(names_df = df, name_col = "TaxonName", author_col = "Authority", id_col = "TaxonID")
#> 
#> ── Matching names to WCVP ──────────────────────────────────────────────────────
#> ℹ Using the `TaxonName` column
#> 
#> ── Exact matching 7 names ──
#> 
#> Error in `bind_rows()`:
#> ! Can't combine `..1$match_type` <character> and `..2$match_type` <logical>.

#> Backtrace:
#>      ▆
#>   1. ├─rWCVP::wcvp_match_names(...)
#>   2. │ └─dplyr::bind_rows(matches, matches_no_author)
#>   3. │   └─vctrs::vec_rbind(!!!dots, .names_to = .id, .error_call = current_env())
#>   4. └─vctrs (local) `<fn>`()
#>   5.   └─vctrs::vec_default_ptype2(...)
#>   6.     ├─base::withRestarts(...)
#>   7.     │ └─base (local) withOneRestart(expr, restarts[[1L]])
#>   8.     │   └─base (local) doWithOneRestart(return(expr), restart)
#>   9.     └─vctrs::stop_incompatible_type(...)
#>  10.       └─vctrs:::stop_incompatible(...)
#>  11.         └─vctrs:::stop_vctrs(...)
#>  12.           └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = vctrs_error_call(call))

Created on 2023-03-10 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.2 (2022-10-31 ucrt) #> os Windows 10 x64 (build 19044) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_Germany.utf8 #> ctype English_Germany.utf8 #> tz Europe/Paris #> date 2023-03-10 #> pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> ada 2.0-5 2016-05-13 [1] CRAN (R 4.2.2) #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.2.2) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.1) #> blob 1.2.3 2022-04-10 [1] CRAN (R 4.2.1) #> cachem 1.0.7 2023-02-24 [1] CRAN (R 4.2.2) #> class 7.3-21 2023-01-23 [1] CRAN (R 4.2.2) #> classInt 0.4-9 2023-02-28 [1] CRAN (R 4.2.2) #> cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.2) #> codetools 0.2-19 2023-02-01 [1] CRAN (R 4.2.2) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.2) #> curl 5.0.0 2023-01-12 [1] CRAN (R 4.2.2) #> data.table 1.14.8 2023-02-17 [1] CRAN (R 4.2.2) #> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.1) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2) #> dplyr 1.1.0 2023-01-29 [1] CRAN (R 4.2.2) #> e1071 1.7-13 2023-02-01 [1] CRAN (R 4.2.2) #> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.2) #> evd 2.3-6.1 2022-07-04 [1] CRAN (R 4.2.2) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.2) #> ff 4.0.9 2023-01-25 [1] CRAN (R 4.2.2) #> fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.2) #> future 1.32.0 2023-03-07 [1] CRAN (R 4.2.2) #> future.apply 1.10.0 2022-11-05 [1] CRAN (R 4.2.2) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1) #> ggplot2 3.4.1 2023-02-10 [1] CRAN (R 4.2.2) #> globals 0.16.2 2022-11-21 [1] CRAN (R 4.2.2) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1) #> gt 0.8.0 2022-11-16 [1] CRAN (R 4.2.2) #> gtable 0.3.1 2022-09-01 [1] CRAN (R 4.2.1) #> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2) #> httr 1.4.5 2023-02-24 [1] CRAN (R 4.2.2) #> ipred 0.9-14 2023-03-09 [1] CRAN (R 4.2.2) #> KernSmooth 2.23-20 2021-05-03 [1] CRAN (R 4.2.2) #> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.2) #> lattice 0.20-45 2021-09-22 [1] CRAN (R 4.2.2) #> lava 1.7.2.1 2023-02-27 [1] CRAN (R 4.2.2) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.2) #> listenv 0.9.0 2022-12-16 [1] CRAN (R 4.2.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1) #> MASS 7.3-58.3 2023-03-07 [1] CRAN (R 4.2.2) #> Matrix 1.5-3 2022-11-11 [1] CRAN (R 4.2.2) #> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.1) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.1) #> nnet 7.3-18 2022-09-28 [1] CRAN (R 4.2.2) #> parallelly 1.34.0 2023-01-13 [1] CRAN (R 4.2.2) #> phonics 1.3.10 2021-07-11 [1] CRAN (R 4.2.2) #> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1) #> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 4.2.1) #> proxy 0.4-27 2022-06-09 [1] CRAN (R 4.2.1) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1) #> Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.2.2) #> RecordLinkage 0.4-12.4 2022-11-08 [1] CRAN (R 4.2.2) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.2) #> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.2) #> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.2) #> rpart 4.1.19 2022-10-21 [1] CRAN (R 4.2.2) #> RSQLite 2.3.0 2023-02-17 [1] CRAN (R 4.2.2) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1) #> rvest 1.0.3 2022-08-19 [1] CRAN (R 4.2.1) #> rWCVP * 1.2.4 2023-03-10 [1] Github (matildabrown/rWCVP@d51a708) #> rWCVPdata 0.3.1 2023-03-06 [1] local #> scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.1) #> selectr 0.4-2 2019-11-20 [1] CRAN (R 4.2.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1) #> sf 1.0-9 2022-11-08 [1] CRAN (R 4.2.2) #> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.2) #> stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.2) #> survival 3.5-3 2023-02-12 [1] CRAN (R 4.2.2) #> tibble 3.2.0 2023-03-08 [1] CRAN (R 4.2.2) #> tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.2.2) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.2) #> units 0.8-1 2022-12-10 [1] CRAN (R 4.2.2) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.2) #> vctrs 0.5.2 2023-01-23 [1] CRAN (R 4.2.2) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1) #> xfun 0.37 2023-01-31 [1] CRAN (R 4.2.2) #> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.1) #> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.1) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.2) #> #> [1] C:/Program Files/R/R-4.2.2/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
matildabrown commented 1 year ago

Hmm, I just had a look and I'm pretty sure the error is because your data are lovely and clean, and all names are exactly matching the author (i.e. there are no exact matches without author).

We'll try to fix this bug ASAP but in the meantime (and for anyone else with this issue), here's the workaround: add a name that doesn't match. You could also match without the author string, but if you've got that data it's better to make use of it.

In your example, I added the dummy species "Artemisia vulgaris not L" and managed to match the rest of the names. In terms of dummy name generation, I think better to use a species name that exactly matches, but an author string that does not (could be left blank).

Thanks for catching this - please let me know if this workaround isn't working-around for you!

df <- data.frame(TaxonName = c("Artemisia vulgaris", "Artemisia vulgaris", "Salix sitchensis", "Elytrigia caespitosa", "Artemisia vulgaris", "Potentilla glandulosa", "Malus coronaria", "Dioscorea esculenta", "Brickellia veronicifolia"), 
                 Authority = c("not L", "L.", "Sanson ex Bong.", "(K.Koch) Nevski", "L.", "Lindl.", "(L.) Mill.", "(Lour.) Burkill", "(Kunth) A.Gray"), 
                 TaxonID = c(1, 35, 333, 124, 34, 303, 231, 112, 55))

wcvp.out <- wcvp_match_names(names_df = df, name_col = "TaxonName", author_col = "Authority", id_col = "TaxonID", fuzzy = F)
nlkinlock commented 1 year ago

Thanks very much for the quick response, Matilda. I've tried the workaround (specifically, changing the author string to be blank for a single dummy species), and it seems to be working.

Of course, the names I'm working with are often not so nice, but these situations do arise from time to time!