ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
111 stars 40 forks source link

Issue with null returns from maturity() #103

Closed tomjwebb closed 7 years ago

tomjwebb commented 8 years ago

Hi - been using the maturity() function to get length at maturity for a list of species (names validated for all). Problem is that if it fails for a single species in the list, the whole list fails. I have a workaround (see below) but wondered if it would be possible to return a null data frame if the function returns nothing? Here's an example:

Set up dummy species list

sp_list_test <- c("Limanda limanda", "Scophthalmus maximus", "Chelidonichthys lucerna", "Echiichthys vipera")

Try getting maturity data by passing vector of species

mat_dat <- maturity(sp_list_test)

This fails:

Error in names(data)[names(data) == "Speccode"] = "SpecCode" : 
  attempt to set an attribute on NULL
In addition: Warning messages:
1: In check_and_parse(resp) : Bad Request (HTTP 400).
2: In error_checks(parsed, resp = resp) :
  no results found for query https://fishbase.ropensci.org/maturity?Speccode=1366&limit=200

My bodged fix: loop over each species individually:

mat_dat <- sapply(sp_list_test, function(sp_list_test){try(maturity(sp_list_test), silent = T)})
lapply(mat_dat, class)
$`Limanda limanda`
[1] "tbl_df"     "tbl"        "data.frame"

$`Scophthalmus maximus`
[1] "tbl_df"     "tbl"        "data.frame"

$`Chelidonichthys lucerna`
[1] "try-error"

$`Echiichthys vipera`
[1] "try-error"

This works - returns data frame for successes and try-error for failures, but will need to piece together data frame afterwards - not really a problem, but would be preferable to be able to return straight off a single data frame with NULL values (apart from species name) for any species that returns nothing.

Relatedly - of the two species that fail in the above example, one (Echiichthys vipera) indeed has no maturity table in fishbase, but the other (Chelidonichthys lucerna) does, see: http://www.fishbase.org/Reproduction/MaturityList.php?ID=1366&GenusName=Chelidonichthys&SpeciesName=lucerna&fc=266 Any idea why it's not returning anything?

Thanks!

tomjwebb commented 8 years ago

Just to follow up on the above, this is my solution to turning the returned list into a single data frame, feels unnecessarily clumsy but it works…

mat_dat <- mat_dat[which(lapply(mat_dat, class) != "try-error")]
mat_dat <- bind_rows(mat_dat)
null_df <- matrix(ncol = ncol(mat_dat), nrow = length(id_null_sp))
colnames(null_df) <- names(mat_dat)
null_df <- as.data.frame(null_df)
null_df$sciname <- sp_list_test[id_null_sp]
mat_dat <- rbind(mat_dat, null_df)
sckott commented 8 years ago

Thanks for your report Tom!

I'll have a look and see what's going on

cboettig commented 8 years ago

@sckott thanks! Looks like check_and_parse() doesn't handle the case of no content after getting a 400 error, so that should be fixed (not at computer & not sure I want to patch it from my phone though).

Not sure why the API isn't returning any content though, seems like it should give an empty json return instead of 404?

The discrepancy between the online source might be fixed when we update the database(?)

sckott commented 8 years ago

@tomjwebb what version of rfishbase do you have?

Chelidonichthys lucerna and Echiichthys vipera are just not in the current version of fishbase DB that we have. - though they may be in the new one, that I should be getting up very soon.

With the current version on github (reinstall like devtools::install_github("ropensci/rfishbase")):

sp_list_test <- c("Limanda limanda", "Scophthalmus maximus", "Chelidonichthys lucerna", "Echiichthys vipera")
maturity(sp_list_test)
#> Warning messages:
#> 1: In check_and_parse(resp) : Bad Request (HTTP 400).
#> 2: no results found for query https://fishbase.ropensci.org/maturity?Speccode=1366&limit=200 
#> 3: In check_and_parse(resp) : Bad Request (HTTP 400).
#> 4: no results found for query https://fishbase.ropensci.org/maturity?Speccode=1364&limit=200

but gives back data for the first two taxa

#> Source: local data frame [18 x 36]
#> 
#>    autoctr              sciname StockCode MaturityRefNo     Sex AgeMatMin AgeMatMin2 AgeMatRef    tm Number    r2 SE_tm SD_tm LCL_tm
#>      (int)                (chr)     (int)         (int)   (chr)     (dbl)      (dbl)     (int) (dbl)  (lgl) (lgl) (lgl) (lgl)  (lgl)
#> 1      967      Limanda limanda       711          6014  female        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 2      968      Limanda limanda       711          6014  female        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 3      969      Limanda limanda       711          6014    male        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 4      970      Limanda limanda       711          6014    male        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 5      972      Limanda limanda       711         27372 unsexed        NA         NA        NA   2.3     NA    NA    NA    NA     NA
#> 6      971      Limanda limanda       711         32766  female        NA         NA      7158   3.0     NA    NA    NA    NA     NA
#> 7     3246      Limanda limanda       711         35388  female       3.0          5        NA    NA     NA    NA    NA    NA     NA
#> 8     3245      Limanda limanda       711         35388    male       2.0          3        NA    NA     NA    NA    NA    NA     NA
#> 9     3596      Limanda limanda       711         40546 unsexed       2.6         NA     40748    NA     NA    NA    NA    NA     NA
#> 10    6721      Limanda limanda       711         74523  female       2.0         NA        NA    NA     NA    NA    NA    NA     NA
#> 11    6722      Limanda limanda       711         74523    male       2.0          3        NA    NA     NA    NA    NA    NA     NA
#> 12    1389 Scophthalmus maximus      1366           748  female        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 13    1390 Scophthalmus maximus      1366          6014  female        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 14    1391 Scophthalmus maximus      1366          6014  female        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 15    1392 Scophthalmus maximus      1366          6014    male        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 16    1393 Scophthalmus maximus      1366          6014    male        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 17    3413 Scophthalmus maximus      1366         32766  female        NA         NA     32766   4.0     NA    NA    NA    NA     NA
#> 18    3239 Scophthalmus maximus      1366         35388   mixed       3.0          5        NA    NA     NA    NA    NA    NA     NA
#> Variables not shown: UCL_tm (lgl), LengthMatMin (dbl), LengthMatMin2 (dbl), Type1 (chr), LengthMatRef (int), Lm (dbl), SE_Lm (lgl),
#>   SD_Lm (lgl), LCL_Lm (lgl), UCL_Lm (lgl), Locality (chr), C_Code (chr), E_CODE (lgl), Comment (chr), Entered (int), DateEntered (chr),
#>   Modified (lgl), DateModified (chr), Expert (lgl), DateChecked (lgl), TS (lgl), SpecCode (int)
sckott commented 8 years ago

ran into problems loading the new fishbase database, getting in touch with them now...

tomjwebb commented 8 years ago

@sckott Sorry, just seen your question, I'm running rfishbase 2.1.0.1 on R 3.2.3

sckott commented 8 years ago

@tomjwebb there is a newer version - reinstall like devtools::install_github("ropensci/rfishbase") and try again - the errors should be fixed, but those taxa missing still aren't there - waiting to hear back on fishbase DB problems

tomjwebb commented 8 years ago

@sckott Got it - works now, thanks! Trying to decide whether returning a single row of NAs (apart from sciname) for species returning no results might be more useful than just excluding them altogether from the returned tibble (see, I know the lingo!) - especially if I want to summarise the output to a single value per species, it would be useful to keep all species there. But that's maybe not appropriate for all use cases and is something that is very easy to do post-hoc, now you've sorted the main issue. So, thanks! Going to put it to use this morning…

sckott commented 7 years ago

seems fixed now, reopen if it's not really fixed