ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
109 stars 40 forks source link

species() returns mostly NAs #257

Closed jmschuster closed 1 year ago

jmschuster commented 1 year ago

Hello, I am trying to extract habitat info for a vector of 120 species using the function species() in rfishbase (4.0.0). However, trying to run species() for all 120 species returns mostly NA's. If I run species() for just 1-5 species in this format (below), there are no NAs... why does the extraction not work for the whole species list at once when the data is clearly there?

Extraction of few species (works): species(c("Clupea pallasii", "Oncorhynchus keta"), fields = species_fields$habitat)

`

A tibble: 3 × 5

Fresh Brack Saltwater DemersPelag AnaCat

1 1 1 1 pelagic-neritic non-migratory 2 1 1 1 benthopelagic anadromous 3 1 1 1 pelagic anadromous ` **Extraction of list of species (returns NA for 90% of species):** `Fish_Hab <- species(species_list=species_vec, fields = species_fields$habitat)` ` Joining, by = c("Subfamily", "GenCode", "FamCode") Joining, by = "FamCode" Joining, by = c("Order", "Ordnum", "Class", "ClassNum") Joining, by = c("Class", "ClassNum") > head(Fish_Hab) # A tibble: 6 × 5 Fresh Brack Saltwater DemersPelag AnaCat 1 NA NA NA NA NA 2 NA NA NA NA NA 3 NA NA NA NA NA 4 1 1 1 benthopelagic anadromous 5 NA NA NA NA NA 6 1 1 1 benthopelagic anadromous ` **Here is my session info:** ` sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] memoise_2.0.1 duckdb_0.3.2-2 DBI_1.1.1 rfishbase_4.0.0 rlist_0.4.6.2 here_1.0.1 [7] ggpubr_0.4.0 lubridate_1.8.0 forcats_0.5.0 stringr_1.4.0 purrr_0.3.4 readr_2.1.2 [13] tidyr_1.1.4 tibble_3.1.6 tidyverse_1.3.0 devtools_2.3.2 usethis_2.0.0 ggplot2_3.3.5 [19] dplyr_1.0.7 knitr_1.36 rmarkdown_2.11 loaded via a namespace (and not attached): [1] colorspace_2.0-2 ggsignif_0.6.0 ellipsis_0.3.2 rio_0.5.16 rprojroot_2.0.2 fs_1.5.0 [7] rstudioapi_0.13 remotes_2.2.0 fansi_1.0.2 xml2_1.3.3 contentid_0.0.15 cachem_1.0.6 [13] pkgload_1.2.4 jsonlite_1.7.3 broom_0.8.0 dbplyr_2.0.0 compiler_4.0.3 httr_1.4.2 [19] backports_1.4.1 assertthat_0.2.1 fastmap_1.0.1 cli_3.2.0 htmltools_0.5.1 prettyunits_1.1.1 [25] tools_4.0.3 gtable_0.3.0 glue_1.6.1 Rcpp_1.0.8 carData_3.0-4 cellranger_1.1.0 [31] vctrs_0.3.8 xfun_0.29 ps_1.6.0 openxlsx_4.2.3 testthat_3.1.1 rvest_0.3.6 [37] lifecycle_1.0.1 rstatix_0.6.0 scales_1.1.1 hms_1.1.1 yaml_2.2.1 curl_4.3.2 [43] stringi_1.7.6 desc_1.4.0 pkgbuild_1.2.0 zip_2.1.1 rlang_1.0.2 pkgconfig_2.0.3 [49] evaluate_0.14 processx_3.5.2 tidyselect_1.1.1 magrittr_2.0.2 R6_2.5.1 generics_0.1.1 [55] pillar_1.7.0 haven_2.3.1 foreign_0.8-80 withr_2.4.3 abind_1.4-5 modelr_0.1.8 [61] crayon_1.5.0 car_3.0-10 utf8_1.2.2 tzdb_0.3.0 progress_1.2.2 readxl_1.3.1 [67] data.table_1.14.2 callr_3.7.0 reprex_0.3.0 digest_0.6.29 openssl_1.4.3 munsell_0.5.0 [73] sessioninfo_1.1.1 askpass_1.1 species_vec [1] "Clupea pallasii" "Oncorhynchus keta" "Oncorhynchus kisutch" [4] "Oncorhynchus tshawytscha" "Salvelinus malma" "Oncorhynchus mykiss" [7] "Oncorhynchus clarkii" "Sardinops sagax" "Engraulis mordax" ` I tried to run `fs::dir_delete(rfishbase:::db_dir())` based on previous issues (#225) after updating readr but it still doesn't work... Thank you kindly for your help!
cboettig commented 1 year ago

Thanks for your report. Please see if you can provide a reprex example so I can help make sure I'm testing what you're testing.

Based on the information you provided I was able to construct this example, it doesn't have all NAs though:

library(rfishbase)

fish <- c("Clupea pallasii", "Oncorhynchus keta", "Oncorhynchus kisutch", 
  "Oncorhynchus tshawytscha", "Salvelinus malma", "Oncorhynchus mykiss",
  "Oncorhynchus clarkii", "Sardinops sagax", "Engraulis mordax")
fields <- c("Fresh", "Brack", "Saltwater", "DemersPelag", "AnaCat")

species(fish, fields = fields)
#> Joining, by = c("Subfamily", "GenCode", "FamCode")
#> Joining, by = "FamCode"
#> Joining, by = c("Order", "Ordnum", "Class", "ClassNum")
#> Joining, by = c("Class", "ClassNum")
#> # A tibble: 9 × 5
#>   Fresh Brack Saltwater DemersPelag     AnaCat       
#>   <int> <int>     <int> <chr>           <chr>        
#> 1     1     1         1 pelagic-neritic non-migratory
#> 2     1     1         1 benthopelagic   anadromous   
#> 3     1     1         1 pelagic         anadromous   
#> 4     1     1         1 benthopelagic   anadromous   
#> 5     1     1         1 benthopelagic   anadromous   
#> 6     1     1         1 benthopelagic   anadromous   
#> 7     1     1         1 demersal        anadromous   
#> 8     0     0         1 pelagic-neritic oceanodromous
#> 9     0     0         1 pelagic-neritic <NA>

Created on 2022-12-08 with reprex v2.0.2

Are you still seeing something different if you copy-paste what I have above?

jmschuster commented 1 year ago

Thank you for your response! Your code worked, and it worked for all the species when manually copied into the 'fish' vector. This made me realize there's an issue with blank spaces between text strings in some of the species names. The blank space between genus and species was recognized for some species in the df, but not others, returning NAs...

This fixed the species text strings df$Species_name <- gsub("\\s+"," ", df$Species_name)

and then the species() extraction worked perfectly. Thanks again!