ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
111 stars 40 forks source link

Query on ecology function not returning trophic level data? #167

Closed adelheenan closed 5 years ago

adelheenan commented 5 years ago
Session Info ```r R version 3.6.0 (2019-04-26) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.5 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] dplyr_0.8.2 rfishbase_3.0.4 loaded via a namespace (and not attached): [1] Rcpp_1.0.1 magrittr_1.5 hms_0.4.2 tidyselect_0.2.5 R6_2.4.0 [6] rlang_0.4.0 fansi_0.4.0 stringr_1.4.0 httr_1.4.0 tools_3.6.0 [11] utf8_1.1.4 cli_1.1.0 assertthat_0.2.1 digest_0.6.19 tibble_2.1.3 [16] crayon_1.3.4 purrr_0.3.2 readr_1.3.1 vctrs_0.1.0 zeallot_0.1.0 [21] curl_3.3 memoise_1.1.0 glue_1.3.1 gh_1.0.1 stringi_1.4.3 [26] compiler_3.6.0 pillar_1.4.2 backports_1.1.4 jsonlite_1.6 pkgconfig_2.0.2 ``` Hi there, First off, thanks for making such a useful package - it is a great resource! I have a question on the ecology function. When I use: ecology("Sphyraena qenie", fields=c("DietTroph", "DietSeTroph", "DietRemark", "FoodTroph", "FoodSeTroph", "FoodRemark")) It returns NAs - however I see on: https://www.fishbase.se/Ecology/FishEcologySummary.php?StockCode=8250&GenusName=Sphyraena&SpeciesName=qenie That there are diet data available. I have a bunch (60) species which return similar NAs - and had assumed it was due to a lack of data on fishbase. Do you recommend I manually check each one? Or is there a reason that the ecology function isn't pulling the data I can see on fishbase? I followed the comments you gave on the closed issue missing trophic level data and tried: remotes::install_github("ropensci/rfishbase") library(rfishbase) library(dplyr) I opened a new R session and tried: ecology("Sphyraena qenie", + server = "fishbase") %>% + select(dplyr::matches("Troph")) But it still returned NAs. Many thanks for your advice! Adel
cboettig commented 5 years ago

@adelheenan Sorry for the difficulties, but I think your issue is not that rfishbase is returning different data, but rather that it's not always obvious for users (or me!!) to figure out which column in the database corresponds to which box on the website. For instance, please try:

ecology("Sphyraena qenie") %>% select(dplyr::matches("Diet"))

I think you should see:

 x %>%   select(dplyr::matches("Diet"))
# A tibble: 1 x 6
  DietTroph DietSeTroph DietTLu DietseTLu DietRemark DietRef
      <dbl>       <dbl>   <dbl>     <dbl> <chr>        <dbl>
1        NA          NA     4.4      0.75 NA           30531

Which I think matches the table values shown under the "Unfished Population" numbers for the Ecology page at https://www.fishbase.se/Ecology/FishEcologySummary.php?StockCode=8250&GenusName=Sphyraena&SpeciesName=qenie. Note that under "original sample" in that table, the data is indeed missing for both diet composition and food items for this species.

Note that FishBase includes different trophic level estimates that are calculated in different ways, (e.g. see #154) and it is really up to the researcher to decide which method is most appropriate for their research question. Quoting below what FishBase team has previously told us on this question.

As for content, FoodTroph gives a MonteCarlo estimate of trophic level based on known food items. DietTroph uses the mean or median (Skit?) of trophic levels derived from actual diet composition studies.

While in theory troph from diet should be more reliable, many diet studies are incomplete or biased and I often find FoodTroph more reasonable.

Remember that FishBase only reflects what it has captured from the published record, and there is a lot of garbage out there.

If you do a serious study about a single species, use the FB refs as a start, search for additional refs and do your own careful analysis only using "good" studies. Don't forget to send copies of new refs to FishBase, as well as indications of "bad" refs that FishBase should mark as such.

And regarding the missing values, Skit weighs in a bit about what lives where:

The trophic level of a species in FishBase can come from one of, or both, of ecology.dietroph and ecology.foodtroph. Diettroph (median of studies) takes priority over foodtroph. These trophic level values are presented in the Ecology page: https://www.fishbase.de/Ecology/FishEcologySummary.php?StockCode=1014&GenusName=Naucrates&SpeciesName=ductor.

FishBase also presents trophic levels as found in the species summary page, https://www.fishbase.de/summary/Naucrates-ductor.html. The value here is taken from the "estimate" table, i.e. estimate.troph. The trophs stored in this table are a mix of diettroph, foodtroph (diet over foodtroph if both are available) and estimates for species which do not have troph. Estimates are based on size and troph of closest relatives, e.g. https://www.fishbase.de/summary/Serranus-inexpectatus.html. Please note that the value given here is rounded to one decimal only. Thus for the case of N. ductor, 3.4 is actually the foodtroph rounded to a decimal.

Another example would for Gadus morhua, which has both diettroph (4.09) and foodtroph (4.29). In the estimate table, it is the diet troph that is carried over but rounded to 4.1.

HTH!

cboettig commented 5 years ago

p.s. as suggested by Skit's comments, you can also try the estimates table:

estimate("Sphyraena qenie") %>%   select(dplyr::matches("Troph"))

which gives me:

# A tibble: 1 x 5
  Troph seTroph TrophObserved TrophPredicted seTrophPredicted
  <dbl>   <dbl>         <dbl> <lgl>          <lgl>           
1  4.52    0.81             0 NA             NA           
adelheenan commented 5 years ago

@cboettig Thanks for the speedy response.

I see, it helps to know that the troph estimates are located in a variety of different tables. I'll have a go with the code you suggest for the species I had with missing data and then do a sense check.

Thanks again for your time creating such neat package!

cboettig commented 5 years ago

Great! closing as resolved but ping if you hit issues