ropensci / rfishbase

R interface to the fishbase.org database
https://docs.ropensci.org/rfishbase
111 stars 40 forks source link

maturity query not selecting species #77

Closed harryganz closed 8 years ago

harryganz commented 8 years ago

Any species I pass into the species argument to the maturity function returns the same data:

For Cod:

> maturity("Gadus morhua")
Source: local data frame [200 x 35]

   autoctr Speccode StockCode MaturityRefNo     Sex AgeMatMin AgeMatMin2 AgeMatRef    tm Number    r2 SE_tm
     (int)    (int)     (int)         (int)   (chr)     (dbl)      (dbl)     (int) (dbl)  (int) (lgl) (lgl)
1        1        2         1             2  female        NA         NA        NA    NA     NA    NA    NA
2        2        2         1             2  female        NA         NA        NA    NA     NA    NA    NA
3        3        2         1             2  female        NA         NA        NA    NA     NA    NA    NA
4        4        2         1             2  female        NA         NA        NA    NA     NA    NA    NA
5        5        2         1             2    male        NA         NA        NA    NA     NA    NA    NA
6        6        2         1             2    male        NA         NA        NA    NA     NA    NA    NA
7        7        2         1             2 unsexed      0.58       0.67        NA    NA     NA    NA    NA
8        8        2         1             2 unsexed        NA         NA        NA    NA     NA    NA    NA
9        9        2         1             2 unsexed        NA         NA        NA    NA     NA    NA    NA
10      10        2         1             2 unsexed        NA         NA        NA    NA     NA    NA    NA
..     ...      ...       ...           ...     ...       ...        ...       ...   ...    ...   ...   ...
Variables not shown: SD_tm (lgl), LCL_tm (lgl), UCL_tm (lgl), LengthMatMin (dbl), LengthMatMin2 (dbl), Type1
  (chr), LengthMatRef (int), Lm (dbl), SE_Lm (lgl), SD_Lm (lgl), LCL_Lm (lgl), UCL_Lm (lgl), Locality (chr),
  C_Code (chr), E_CODE (int), Comment (chr), Entered (int), DateEntered (chr), Modified (int), DateModified
  (chr), Expert (int), DateChecked (chr), TS (lgl)

For Cocoa Damselfish:

 maturity("Stegastes variabilis")
Source: local data frame [200 x 35]

   autoctr Speccode StockCode MaturityRefNo     Sex AgeMatMin AgeMatMin2 AgeMatRef    tm Number    r2 SE_tm
     (int)    (int)     (int)         (int)   (chr)     (dbl)      (dbl)     (int) (dbl)  (int) (lgl) (lgl)
1        1        2         1             2  female        NA         NA        NA    NA     NA    NA    NA
2        2        2         1             2  female        NA         NA        NA    NA     NA    NA    NA
3        3        2         1             2  female        NA         NA        NA    NA     NA    NA    NA
4        4        2         1             2  female        NA         NA        NA    NA     NA    NA    NA
5        5        2         1             2    male        NA         NA        NA    NA     NA    NA    NA
6        6        2         1             2    male        NA         NA        NA    NA     NA    NA    NA
7        7        2         1             2 unsexed      0.58       0.67        NA    NA     NA    NA    NA
8        8        2         1             2 unsexed        NA         NA        NA    NA     NA    NA    NA
9        9        2         1             2 unsexed        NA         NA        NA    NA     NA    NA    NA
10      10        2         1             2 unsexed        NA         NA        NA    NA     NA    NA    NA
..     ...      ...       ...           ...     ...       ...        ...       ...   ...    ...   ...   ...
Variables not shown: SD_tm (lgl), LCL_tm (lgl), UCL_tm (lgl), LengthMatMin (dbl), LengthMatMin2 (dbl), Type1
  (chr), LengthMatRef (int), Lm (dbl), SE_Lm (lgl), SD_Lm (lgl), LCL_Lm (lgl), UCL_Lm (lgl), Locality (chr),
  C_Code (chr), E_CODE (int), Comment (chr), Entered (int), DateEntered (chr), Modified (int), DateModified
  (chr), Expert (int), DateChecked (chr), TS (lgl)

I've also noticed that the output is missing the sciname column. My guess is that it is unable to query by species for some reason and is just pulling the first 200 records every time, no matter the query.

cboettig commented 8 years ago

Thanks for the bug report. Seems like the fishbase team has somewhat inconsistent use of names for the SpecCode key in their database. I can put a work-around into rfishbase for the maturity table, but we're looking into more general solutions. https://github.com/ropensci/fishbaseapi/issues/83

sckott commented 8 years ago

Right, that table wants Speccode not SpecCode

harryganz commented 8 years ago

In the meantime I made a fairly brute force workaround in my forked version of this repo. You can find it at http://github.com/harryganz/rfishbase. I don't think I'll make a pull request because it is very specific, and hopefully the fishbase team fixes the naming problem. If you would like me to make a pull request, tell me in a comment below.

cboettig commented 8 years ago

@harryganz Yeah, it looks like a proper fix on the fishbase end will be a while; apparently the PHP code for the website has a lot of the inconsistent case use hard-coded. We may be able to do some normalization on the API end though, since it looks like maturity table isn't the only one impacted.

Meanwhile, feel free to send a PR, your fix looks reasonable under the circumstances and is certainly an improvement over what we have here! Thanks!

harryganz commented 8 years ago

I sent a pull request. It only works with the maturity endpoint for now, but If I could get my hands on a database schema, I could write a more general workaround for any misnamed fields.

cboettig commented 8 years ago

@harryganz Thanks. The API will list other tables that use Speccode instead of SpecCode: http://fishbase.ropensci.org/listfields?fields=Speccode&exact=true

(Note that not all of the tables listed are actually implemented/exposed by the API, and thus won't have corresponding endpoints. http://fishbase.ropensci.org/heartbeat lists tables that have been implemented.)

Would be great if you wanted to add these to your PR while you're at it; otherwise no worries we'll get around to it soon.

harryganz commented 8 years ago

Added to pull request. I think all the functions that take species_list as an argument should work now.

On a related note, I don't really know how you guys are testing, so I tested them manually but didn't add any more tests.

hkindsvater commented 8 years ago

I'm getting an error when I try to use maturity () today:

maturity(tunas) Error in names(data)[names(data) == "Speccode"] = "SpecCode" : attempt to set an attribute on NULL In addition: Warning messages: 1: In check_and_parse(resp) : Bad Request (HTTP 400). 2: In error_checks(parsed, resp = resp) : no results found for query https://fishbase.ropensci.org/maturity?Speccode=91&limit=200

any hints? thanks!

sckott commented 8 years ago

There are indeed no results for the query https://fishbase.ropensci.org/maturity?Speccode=91&limit=200 Looks to be a parsing problem internally in the pkg. Will have a look

sckott commented 8 years ago

@hkindsvater Is it right that the query you tried was maturity("tunas") ? @cboettig correct me if wrong, but I think that function expects a sci. name, not a common name

hkindsvater commented 8 years ago

Apologies, tunas is an object I defined:

tunas <- species_list(Family = "Scombridae")

cboettig commented 8 years ago

@sckott Looks like the API is still unhappy about Speccode being in the wrong case (i.e. query works with SpecCode but not Speccode. Remind me, didn't we tweak the API code to ignore case in the queries? Or was there a reason that wouldn't work?

I may also have to fix the R code to handle both cases.

(in any event, annoying that the database can't name their own key fields consistently...)

sckott commented 8 years ago

Hmm, can't remember if the speccode case thing was fixed in API or not, will check

for this client though, I think a fix is

diff --git a/R/00-endpoint.R b/R/00-endpoint.R
index dd8f551..82ba1f2 100644
--- a/R/00-endpoint.R
+++ b/R/00-endpoint.R
@@ -26,7 +26,7 @@ endpoint <- function(endpt, tidy_table = default_tidy){
                         httr::user_agent(make_ua()))
       data <- check_and_parse(resp)

-      if(endpt %in% bad_tables){
+      if(endpt %in% bad_tables && !is.null(data)){
         names(data)[names(data) == "Speccode"] = "SpecCode"
       }

as one of the IDs sent through the function call resulted in NULL, but then that if() statement was failing on a value of NULL

sckott commented 8 years ago

@cboettig looks like we never sorted this on the API side https://github.com/ropensci/fishbaseapi/issues/83 - I'll try to get that done soon

sckott commented 8 years ago

@cboettig I can send a pull request if you like with that change.

In this case the request is right, it's just that for code 91 there are no results, so we still need to make sure the client is robust to this scenario

cboettig commented 8 years ago

code 91 seems to have the full limit number of results when using SpecCode though: https://fishbase.ropensci.org/maturity?SpecCode=91&limit=200

PR would be great.

sckott commented 8 years ago

yeah, but curl 'https://fishbase.ropensci.org/maturity?SpecCode=91&limit=200' | jq .data[].Speccode gives

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 
3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 7 12 12 
12 14 14 15 15 16 16 17 17 17 18 19 19 21 
21 24 24 24 24 24 24 24 24 24 24 24 24 24 
24 24 24 24 24 23 26 26 26 26 26 26 26 26 
26 26 26 26 26 26 27 28 28 28 28 29 29 29 
29 29 29 29 30 30 30 30 30 30 30 30 30 30 
30 30 30 30 30 30 30 30 30 30 30 30 30 30 
30 30 30 30 30 30 30 30 30 30 30 30 30 30 
30 30 30 30 30 30 30 30 30 30 30 30 30 30 
30 30 30 30 30 30 30 30 31 31 31 31 32 33 
33 33 33 34 34 35 35 35 35 35 36 36 37 37 37
sckott commented 8 years ago

@hkindsvater try installing from another branch devtools::install_github("ropensci/rfishbase@sac-null") and try your example code again, should work now

hkindsvater commented 8 years ago

Looks like it worked! thanks for getting to this so quickly

sckott commented 8 years ago

glad it works

sckott commented 8 years ago

Okay, this is sorted now, e.g.,

maturity("Gadus morhua")
#> Source: local data frame [83 x 36]
#> 
#>    autoctr      sciname StockCode MaturityRefNo     Sex AgeMatMin AgeMatMin2 AgeMatRef    tm Number    r2 SE_tm SD_tm LCL_tm
#>      (int)        (chr)     (int)         (int)   (chr)     (dbl)      (dbl)     (int) (dbl)  (lgl) (lgl) (lgl) (lgl)  (lgl)
#> 1      196 Gadus morhua        79           796 unsexed        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 2      197 Gadus morhua        79          1371 unsexed       2.0         NA        NA    NA     NA    NA    NA    NA     NA
#> 3      198 Gadus morhua        79          1371 unsexed       4.0         NA        NA    NA     NA    NA    NA    NA     NA
#> 4      199 Gadus morhua        79          6014  female        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 5      200 Gadus morhua        79          6014    male        NA         NA        NA    NA     NA    NA    NA    NA     NA
#> 6      201 Gadus morhua        79         11063  female       5.0         NA     11260    NA     NA    NA    NA    NA     NA
#> 7      202 Gadus morhua        79         11063    male       5.0         NA     11260    NA     NA    NA    NA    NA     NA
#> 8      203 Gadus morhua        79         11063 unsexed       2.5         NA     11263    NA     NA    NA    NA    NA     NA
#> 9      205 Gadus morhua        79         11063 unsexed       7.0         NA     11259    NA     NA    NA    NA    NA     NA
#> 10     206 Gadus morhua        79         11063 unsexed       6.0         NA     11259    NA     NA    NA    NA    NA     NA
#> ..     ...          ...       ...           ...     ...       ...        ...       ...   ...    ...   ...   ...   ...    ...
#> Variables not shown: UCL_tm (lgl), LengthMatMin (dbl), LengthMatMin2 (dbl), Type1 (chr), LengthMatRef (int), Lm (dbl), SE_Lm
#>   (lgl), SD_Lm (lgl), LCL_Lm (lgl), UCL_Lm (lgl), Locality (chr), C_Code (chr), E_CODE (int), Comment (chr), Entered (int),
#>   DateEntered (chr), Modified (int), DateModified (chr), Expert (int), DateChecked (chr), TS (lgl), SpecCode (int)
maturity("Stegastes variabilis")
#> Source: local data frame [0 x 0]
#> 
#> Warning messages:
#> 1: In check_and_parse(resp) : Bad Request (HTTP 400).
#> 2: In error_checks(parsed, resp = resp) :
#>   no results found for query https://fishbase.ropensci.org/maturity?Speccode=3654&limit=200

And the code that goes with Stegastes variabilis (3654) has no results https://fishbase.ropensci.org/maturity?speccode=3654 as it should be

Closing