Closed wmorgan485 closed 1 year ago
Yes, apologies; letting the parser guess types from the database works poorly for sparse data; if readr
sees all NA at the top it assumes logical
type, and then coercion turns any numeric or chr data to NA. So currently the fallback mechanism defaults somewhat aggressively to character vectors, since this is lossless and thus easy for the user to fix.
The next major release will probably move from a tsv
backend to a parquet
backend, allowing us to preserve types more accurately.
Thanks @cboettig and @wmorgan485
I came across this problem as well. Unfortunately Sealifebase and RFishbase end up with different column types so merging them on the fly is also difficult. I wrote a quick function to convert the required columns to numeric. It might help others. (Note the list of columns may not be exhaustive, but works for what I need).
fix_species_type <- function(df, server = "fishbase"){
if(server == "fishbase"){ # Need to convert type of diferent columes depending on database
nm <- c("SpecCode", "DepthRangeShallow", "CommonLength", "CommonLengthF", "LongevityWildRef", "MaxLengthRef", "DangerousRef")
} else if(server == "sealifebase"){
nm <- c("SpecCode", "SpeciesRefNo", "GenCode", "DepthRangeRef", "LongevityWildRef", "Weight")
}
df <- df %>%
mutate(across(any_of(nm), as.numeric)) # Convert `nm` variables to numeric
}
df <- rfishbase::species(server = "fishbase") %>%
fix_species_type(server = "fishbase") # Warnings are for converting "NA" to NA
df <- rfishbase::species(server = "sealifebase") %>%
fix_species_type(server = "sealifebase") # Warnings are for converting "NA" to NA
When using the species() function today, I noticed that some numeric variables are coming in as character
<chr>
vectors. This can also be seen in examples of the current README document. For example, under the "Getting Data" section, the commandspecies(trout$Species)
returned a tibble where all of the variables are<chr>
.I see something similar, but when I ran the command
species()
, most of the numeric variables came in correctly, but some (such asSpecCode
,DepthRangeShallow
, andLongevityWildRef
) inappropriately came in as<chr>
vectors.Thanks for your efforts! Bill
Session Info
```r sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 11.4 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dplyr_1.0.7 rfishbase_3.1.9 loaded via a namespace (and not attached): [1] Rcpp_1.0.7 pillar_1.6.2 compiler_4.1.0 dbplyr_2.1.1 prettyunits_1.1.1 [6] progress_1.2.2 tools_4.1.0 bit_4.0.4 digest_0.6.27 RSQLite_2.2.7 [11] jsonlite_1.7.2 evaluate_0.14 memoise_2.0.0 lifecycle_1.0.0 tibble_3.1.3 [16] pkgconfig_2.0.3 rlang_0.4.11 rstudioapi_0.13 DBI_1.1.1 cli_3.0.1 [21] curl_4.3.2 yaml_2.2.1 xfun_0.24 fastmap_1.1.0 withr_2.4.2 [26] stringr_1.4.0 arkdb_0.0.12 httr_1.4.2 knitr_1.33 hms_1.1.0 [31] generics_0.1.0 vctrs_0.3.8 bit64_4.0.5 tidyselect_1.1.1 glue_1.4.2 [36] R6_2.5.0 gh_1.3.0 fansi_0.5.0 rmarkdown_2.9 bookdown_0.22.3 [41] blob_1.2.2 tzdb_0.1.2 readr_2.0.0 purrr_0.3.4 magrittr_2.0.1 [46] ellipsis_0.3.2 htmltools_0.5.1.1 assertthat_0.2.1 utf8_1.2.2 stringi_1.7.3 [51] cachem_1.0.5 crayon_1.4.1 ```