ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
111 stars 40 forks source link

Update of release database #163

Closed munrohannah closed 5 years ago

munrohannah commented 5 years ago

Are there plans to update the current database release from 17.07 to a more recent one? I am find that a lot of information has been updated in the last (almost) two years, especially on sealife base.

Alternatively, am I missing how to update which release that I use?

Here is an example with a species where there is data online not available through rfishbase.

Session Info ```r library(tidyverse) library("rfishbase", lib.loc="~/R/win-library/3.5") sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rfishbase_3.0.3 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.0.1 purrr_0.3.2 readr_1.3.1 [7] tidyr_0.8.3 tibble_2.1.1 ggplot2_3.1.1 tidyverse_1.2.1 loaded via a namespace (and not attached): [1] Rcpp_1.0.1 cellranger_1.1.0 pillar_1.3.1 compiler_3.5.1 plyr_1.8.4 tools_3.5.1 [7] digest_0.6.18 memoise_1.1.0 jsonlite_1.6 lubridate_1.7.4 gtable_0.3.0 nlme_3.1-137 [13] lattice_0.20-35 pkgconfig_2.0.2 rlang_0.3.4 cli_1.1.0 rstudioapi_0.10 yaml_2.2.0 [19] haven_2.1.0 withr_2.1.2 xml2_1.2.0 httr_1.4.0 generics_0.0.2 hms_0.4.2 [25] grid_3.5.1 tidyselect_0.2.5 glue_1.3.1 R6_2.4.0 fansi_0.4.0 readxl_1.3.1 [31] modelr_0.1.4 magrittr_1.5 backports_1.1.4 scales_1.0.0 rvest_0.3.3 assertthat_0.2.1 [37] colorspace_1.4-1 utf8_1.1.4 stringi_1.4.3 lazyeval_0.2.2 munsell_0.5.0 broom_0.5.2 [43] crayon_1.3.4 ```

Here is the code:

options(FISHBASE_API = "https://fishbase.ropensci.org/sealifebase")

ecology("Aristaeopsis edwardsiana")%>%
  select(Species,Herbivory2,FeedingType)

# A tibble: 1 x 3
  Species                  Herbivory2 FeedingType
  <chr>                    <chr>      <chr>      
1 Aristaeopsis edwardsiana NA         NA    

Here is the info on sealifebase with the data highlighted by a red box: fishbase_question_A_edwardsiana

sckott commented 5 years ago

Are there plans to update the current database release from 17.07 to a more recent one?

what does the 17.07 refer to? is that a date?

cboettig commented 5 years ago

@sckott yeah, that means July 2017; though the most recent data snapshots are actually 18.10, so less than a year old.

Guess we should at least be aiming for 19.10 release if not before...

munrohannah commented 5 years ago

I know that updating the database release involves work, but I wanted to ensure it was noted that it would be valued! The most recent sealifebase release is even from 2019 I believe.

sckott commented 5 years ago

just updated 5 min ago

munrohannah commented 5 years ago

@sckott does this mean I should be able to access the latest sealife release now?

Using the example from above I still cannot see the updated fields. I have updated rfishbase to the latest release on github, 3.0.3. I have restarted R with a new session. I have updated the code slightly.

ecology("Strongylocentrotus droebachiensis",server="sealifebase")%>%
  select(Species,Herbivory2,FeedingType)

Did I misunderstand your latest comment, or am I missing something?

sckott commented 5 years ago

The API was updated. I think rfishbase uses the static data by default ? Seems like the lastest data isn't up to date with the API https://github.com/ropensci/rfishbase/releases @cboettig

cboettig commented 5 years ago

Yup, like @sckott says I'll need to sync the static cache first for that to work from R, sorry :/. Can probably get around to that this evening. thanks both!

munrohannah commented 5 years ago

@cboettig, has the static cache been update?

cboettig commented 5 years ago

@munrohannah Did you update the env var to point to a newer version? See README: https://github.com/ropensci/rfishbase#version-stability

Arguably we should have this detect the latest version by default, but it doesn't currently do that, so if you didn't request 18.10 version you'll still get 17.07. I've just posted the 19.04 release as well, so you should be able to use that.

I'll open a separate issue about having the package access the latest version by default...

thanks for your patience

munrohannah commented 5 years ago

@cboettig this is all helps, and in theory should work, but I am still not getting the updated data.

I think that the problem that I am having is specific to the slb.2ecology.tsv file in last two releases. When I looked at the data from any other file (eg. slb2fdiseases or slb2fdiet_items) there is a .tsv file in the .bz2 folder, but for the 18.10 and 19.04 releases it is a SLB5FC~1.BZ2 file.

cboettig commented 5 years ago

@munrohannah hmm, weird, I cannot reproduce what you are seeing. Can you try first restarting R, then in a fresh session do:

library(rfishbase)
options(FISHBASE_API="sealifebase", FISHBASE_VERSION="19.04")
eco <- ecology()

This gives me an eco tibble of dimension 121,461 x 142. This is slightly bigger than the 18.10 table, which is 119,140 x 141. Note that you'll need to restart R to clear the caching between changing versions, otherwise the functions will return previous cached data. (We should probably add a function to clear the cache manually without restarting...)

cboettig commented 5 years ago

(also note that you can test directly from the cache to see if that works, e.g.

download.file("https://github.com/ropensci/rfishbase/releases/download/slb-19.04/slb.2fecology.tsv.bz2", "slb.2fecology.tsv.bz2")

readr::read_tsv("slb.2fecology.tsv.bz2")
munrohannah commented 5 years ago

Thank you, this is working for me now. I appreciate your patience.