ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
109 stars 40 forks source link

Inconsistent validate_names() & species() output for a specific species #270

Closed Telis15 closed 10 months ago

Telis15 commented 10 months ago
Description ``` Unfortunately I have not been able to create a reproducible example without my full dataframe (it behaves correctly every time I try to reproduce the issue). Essentially what is happening is that when I run validate_names("Astyanax aeneus"), I get an output indicating that the name is correct as-is, however, when I add a "Checked Name" column to my entire species list, this species shows up as invalid: SpeciesList$CheckedSpp <- validate_names(SpeciesList$Species) All other species receive the correct output (invalid species return NA, valid species return the original name or an updated name. The real problem here is that I am attempting to merge my sample dataset with taxonomic information from Fishbase (adding columns for Family, Genus, etc. ), but when it is part of a large dataframe, rfishbase seems convinced that Astyanax aeneus doesn't exist, and therefore returns NAs for all fields. Within my species list, this is the only species being mishandled, and I am not experienced enough in R to suss out the source of the confusion. I have come up with a workaround (running species() on that species alone and then binding that row into the full SpeciesList df, but I am hoping there is a clear explanation for why this might be happening, or maybe it is just a bug. Please let me know if there is any additional information that might be helpful. Thanks! ```r ``` Session Info: setting value version R version 4.3.1 (2023-06-16 ucrt) os Windows 11 x64 (build 22621) system x86_64, mingw32 ui RStudio language (EN) collate English_United States.utf8 ctype English_United States.utf8 tz America/Chicago date 2023-09-04 rstudio 2023.06.1+524 Mountain Hydrangea (desktop) pandoc NA ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package * version date (UTC) lib source askpass 1.1 2019-01-13 [1] CRAN (R 4.3.1) bbmle * 1.0.25 2022-05-11 [1] CRAN (R 4.3.1) bdsmatrix 1.3-6 2022-06-03 [1] CRAN (R 4.3.0) boot 1.3-28.1 2022-11-22 [2] CRAN (R 4.3.1) cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0) carData * 3.0-5 2022-01-06 [1] CRAN (R 4.3.0) cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.3.0) cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) clock * 0.7.0 2023-05-15 [1] CRAN (R 4.3.1) cluster 2.1.4 2022-08-22 [2] CRAN (R 4.3.1) colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) contentid 0.0.17 2023-04-21 [1] CRAN (R 4.3.1) crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0) curl 5.0.2 2023-08-14 [1] CRAN (R 4.3.1) data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.0) DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.0) dbplyr 2.3.3 2023-07-07 [1] CRAN (R 4.3.1) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.1) digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.1) duckdb 0.8.1-3 2023-09-01 [1] CRAN (R 4.3.1) effects * 4.2-2 2022-07-13 [1] CRAN (R 4.3.0) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0) fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) ggcorrplot * 0.1.4 2022-09-27 [1] CRAN (R 4.3.0) ggplot2 * 3.4.3 2023-08-14 [1] CRAN (R 4.3.1) ggVennDiagram * 1.2.3 2023-08-14 [1] CRAN (R 4.3.1) glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) gridExtra * 2.3 2017-09-09 [1] CRAN (R 4.3.0) gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) GUILDS 1.4.6 2023-08-21 [1] CRAN (R 4.3.1) hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0) htmltools 0.5.6 2023-08-10 [1] CRAN (R 4.3.1) htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0) httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.3.1) httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.1) insight 0.19.3 2023-06-29 [1] CRAN (R 4.3.1) jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) lattice * 0.21-8 2023-04-05 [2] CRAN (R 4.3.1) lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) lme4 1.1-34 2023-07-04 [1] CRAN (R 4.3.1) lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.3.0) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) MASS * 7.3-60 2023-05-04 [2] CRAN (R 4.3.1) Matrix 1.6-1 2023-08-14 [1] CRAN (R 4.3.1) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0) mgcv 1.8-42 2023-03-02 [2] CRAN (R 4.3.1) mime 0.12 2021-09-28 [1] CRAN (R 4.3.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.1) minqa 1.2.5 2022-10-19 [1] CRAN (R 4.3.0) mitools 2.4 2019-04-26 [1] CRAN (R 4.3.0) munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) mvtnorm 1.2-3 2023-08-25 [1] CRAN (R 4.3.1) nlme 3.1-162 2023-01-31 [2] CRAN (R 4.3.1) nloptr 2.0.3 2022-05-26 [1] CRAN (R 4.3.0) nnet 7.3-19 2023-05-03 [2] CRAN (R 4.3.1) numDeriv 2016.8-1.1 2019-06-06 [1] CRAN (R 4.3.0) openssl 2.1.0 2023-07-15 [1] CRAN (R 4.3.1) permute * 0.9-7 2022-01-27 [1] CRAN (R 4.3.0) pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) pkgbuild 1.4.2 2023-06-26 [1] CRAN (R 4.3.1) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) pkgload 1.3.2.1 2023-07-08 [1] CRAN (R 4.3.1) plyr 1.8.8 2022-11-11 [1] CRAN (R 4.3.0) poilog 0.4.2 2022-10-13 [1] CRAN (R 4.3.0) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.0) processx 3.8.2 2023-06-30 [1] CRAN (R 4.3.1) profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.1) progress 1.2.2 2019-05-16 [1] CRAN (R 4.3.0) promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.0) purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0) readxl * 1.4.3 2023-07-06 [1] CRAN (R 4.3.1) remotes 2.4.2.1 2023-07-18 [1] CRAN (R 4.3.1) reshape2 * 1.4.4 2020-04-09 [1] CRAN (R 4.3.0) rfishbase * 4.1.2 2023-06-02 [1] CRAN (R 4.3.1) rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) RVenn 1.1.0 2019-07-18 [1] CRAN (R 4.3.0) sads * 0.4.2 2018-06-16 [1] CRAN (R 4.3.1) scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1) shiny 1.7.5 2023-08-12 [1] CRAN (R 4.3.1) stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0) stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.0) suncalc * 0.5.1 2022-09-29 [1] CRAN (R 4.3.0) survey 4.2-1 2023-05-03 [1] CRAN (R 4.3.1) survival 3.5-5 2023-03-12 [2] CRAN (R 4.3.1) tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0) tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0) timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0) tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.1) usethis 2.2.2 2023-07-06 [1] CRAN (R 4.3.1) utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.1) vegan * 2.6-4 2022-10-11 [1] CRAN (R 4.3.0) VGAM 1.1-8 2023-03-09 [1] CRAN (R 4.3.1) withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1)
cboettig commented 10 months ago

apologies, but it is not possible for me to debug a problem that I cannot reproduce. From the description, it sounds like you may be missing something in how you are trying to join two tables together?

Can you try making a small example -- larger than the case of a single species, where you say it works, but perhaps smaller than the very large data frame you say you are working with?

Telis15 commented 10 months ago

Thank you for your response. In the process of building a replicable example, I exported my species list as a csv and found a sneaky character ("Â") between the genus and species of the fish I was having issues with. In R it just showed up as a space, and I have no idea how it got there. I just manually re-typed the species name in my original .xlsx file, and now it is behaving as expected with rfishbase.

Thanks again for your quick response, and for this incredibly useful package!