ropensci / rredlist

IUCN Red List API Client
https://docs.ropensci.org/rredlist
Other
48 stars 13 forks source link

Bad Gateway (HTTP 502) error when trying to get country occurrence data #25

Closed stevenpbachman closed 4 years ago

stevenpbachman commented 6 years ago
Session Info ```r R version 3.3.3 (2017-03-06) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rredlist_0.4.0 data.table_1.10.4 dplyr_0.5.0 rCAT_0.1.5 rgdal_1.2-7 jsonlite_1.5 rgbif_0.9.8 raster_2.5-8 [9] sp_1.2-4 loaded via a namespace (and not attached): [1] Rcpp_0.12.10 xml2_1.1.1 whisker_0.3-2 magrittr_1.5 munsell_0.4.3 colorspace_1.3-2 lattice_0.20-34 R6_2.2.0 quadprog_1.5-5 [10] rlang_0.1.2 stringr_1.2.0 httr_1.2.1 plyr_1.8.4 tools_3.3.3 parallel_3.3.3 grid_3.3.3 geoaxe_0.1.0 gtable_0.2.0 [19] DBI_0.6-1 rgeos_0.3-22 assertthat_0.1 lazyeval_0.2.0 httpcode_0.2.0 tibble_1.3.3 ggplot2_2.2.1 triebeard_0.3.0 curl_2.6 [28] crul_0.3.8 pracma_1.9.9 oai_0.2.2 stringi_1.1.3 urltools_1.6.0 scales_0.4.1 ```

Hi, thanks again for this package, which I'm using a lot.

I'm just trying to get country occurrence data for all plants on the Red List, but I keep hitting a Bad Gateway (HTTP 502) error. See below for the code I'm using. I already have a vector with all plant taxonid's (plant.id) and I'm trying lapply on a function rl.countries that should get the country occurrence for each taxonid. I usually get part of the way through before I get the 502 error.

I've been in touch with the Red List admin team and they say everything fine with the API, so I'm just trying to work out where the problem is.

Many thanks, Steve Bachman

#get countries occurrence list based on taxon ID
rl.countries = function(x){
  cntry = rl_occ_country(id=x,key=rlkey)
  result = cntry$result
  taxonid = cntry$name
  results = cbind(name,result)
  colnames(results)[1] = 'taxonid'
  return(results)
}

#run lapply on rl.countries function
#plants.id is vector of taxonid's for all plants (total = 24,408) 
apply.countries = lapply(plants.id,rl.countries)
countries_df = as.data.frame(do.call(rbind,apply.countries))

Error: Bad Gateway (HTTP 502)

sckott commented 6 years ago

thanks @stevenpbachman for the report.

can you give an example of a single request to rl_occ_country that throws that error, and with verbose=TRUE as a parameter. paste in output of the request headers that you get back

it seems likely it's a rate limiting issue, but could use a better error message (i may be able to make it better, but may not depending on what they give back) - have you seen this bit in the docs https://github.com/ropensci/rredlist/blob/v0.4.0/R/rredlist-package.R#L40-L45

stevenpbachman commented 6 years ago

Hi @sckott

Thanks for getting back and for the help. I just tried again on a subset of 1,000 taxon IDs, after fixing previous mistake with the cbind in that function, and have pasted last few outputs below. Hope it makes some sense to you.

Yes, could be the rate limiting. I also tried adding delay in a for loop previously, but just half a second, so maybe 2 seconds would work. I will try it, although would mean a long time to get entire red list, but then again, should only need to do it on every update. Not sure how to add delay in lapply... will try to work it out 🤔

thanks,

Steve

#get countries based on taxon ID
rl.countries = function(x){
  cntry = rl_occ_country(id=x,key=rlkey,verbose=TRUE)
  cntryresult = cntry$result
  taxonid = cntry$name
  results = cbind(taxonid,result)
  colnames(results)[1] = 'taxonid'
  return(results)
}
< HTTP/1.1 200 OK
< Server: nginx/1.1.19
< Date: Mon, 13 Nov 2017 11:29:59 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 239
< Connection: keep-alive
< X-Powered-By: Sails <sailsjs.org>
* Added cookie sails.sid="s%3AYSxPWTclwy3cYh34B1Uak5IE.dEyRnGCvnU%2BuR07XlmL2%2BLxhfcAK%2Fn8zBVPiL4CwEeg" for domain apiv3.iucnredlist.org, path /, expire 0
< Set-Cookie: sails.sid=s%3AYSxPWTclwy3cYh34B1Uak5IE.dEyRnGCvnU%2BuR07XlmL2%2BLxhfcAK%2Fn8zBVPiL4CwEeg; Path=/; HttpOnly
< Vary: Accept-Encoding
< 
* Connection #7 to host apiv3.iucnredlist.org left intact
* Found bundle for host apiv3.iucnredlist.org: 0x166e3f00 [can pipeline]
* Re-using existing connection! (#7) with host apiv3.iucnredlist.org
* Connected to apiv3.iucnredlist.org (176.58.126.20) port 80 (#7)
> GET /api/v3/species/countries/id/30585?token=cf3cf316d5dd72945456d42a633ac92357f6b8039bf1d512cc58e26d6169a8f3 HTTP/1.1
Host: apiv3.iucnredlist.org
Accept: */*
User-Agent: libcurl/7.53.1 r-curl/2.6 crul/0.3.8
Accept-Encoding: gzip, deflate

< HTTP/1.1 200 OK
< Server: nginx/1.1.19
< Date: Mon, 13 Nov 2017 11:30:00 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 444
< Connection: keep-alive
< X-Powered-By: Sails <sailsjs.org>
* Added cookie sails.sid="s%3AYJFltYzROgzCSbNPQ9dr_W6V.yf9mAsJS2ALCuEESpdb1n32NYWucN3OOITMprBX2C6w" for domain apiv3.iucnredlist.org, path /, expire 0
< Set-Cookie: sails.sid=s%3AYJFltYzROgzCSbNPQ9dr_W6V.yf9mAsJS2ALCuEESpdb1n32NYWucN3OOITMprBX2C6w; Path=/; HttpOnly
< Vary: Accept-Encoding
< 
* Connection #7 to host apiv3.iucnredlist.org left intact
* Found bundle for host apiv3.iucnredlist.org: 0x166e3f00 [can pipeline]
* Re-using existing connection! (#7) with host apiv3.iucnredlist.org
* Connected to apiv3.iucnredlist.org (176.58.126.20) port 80 (#7)
> GET /api/v3/species/countries/id/30586?token=cf3cf316d5dd72945456d42a633ac92357f6b8039bf1d512cc58e26d6169a8f3 HTTP/1.1
Host: apiv3.iucnredlist.org
Accept: */*
User-Agent: libcurl/7.53.1 r-curl/2.6 crul/0.3.8
Accept-Encoding: gzip, deflate

< HTTP/1.1 502 Bad Gateway
< Server: nginx/1.1.19
< Date: Mon, 13 Nov 2017 11:30:00 GMT
< Content-Type: text/html
< Content-Length: 173
< Connection: keep-alive
< 
* Connection #7 to host apiv3.iucnredlist.org left intact
Error: Bad Gateway (HTTP 502)
stevenpbachman commented 6 years ago

For info, just tried this again with a 2 second delay and still getting the 502 error.

Steve

sckott commented 6 years ago

@stevenpbachman thx! sorry, i should have said to strip your IUCN API key from the headers before sharing them here (not good to share private keys like that publicly) - you should ask for a new one from IUCN folks http://apiv3.iucnredlist.org/api/v3/token - nevermind if that's not your real key!

It still seems like a rate-limiting problem to me. two things:

  1. I'll contact IUCN again and urge them more strongly to allow users to see their rate limits somehow.
  2. sleeping. you asked about how to do sleep in lapply. can do e.g.,
lapply(c(30585, 30586), function(z) {
  Sys.sleep(2)
  rl_occ_country(id = z)
})
stevenpbachman commented 6 years ago

@sckott thanks and double doh! from me for posting the key and for not trying that solution to lapply.

I'll get a new key and hopefully you'll get a response from IUCN. Thanks again.

arw36 commented 6 years ago

Hi, just chiming in that I get this error a lot depending on the number of items I am looping through the lapply function. Usually, if I keep the query to 100 or below it works well.

sckott commented 6 years ago

thanks @arw36 - do you mean the length of the list/vector of ids/names passed to lapply is 100 or less? What after that? Do you wait some amount of time to do more

arw36 commented 6 years ago

Yes, the vector is less than 100. No specific amount of time in between, but I see how this can become a very tedious process. Also, I tried to do my full vector list of 1000+ using the Sys.sleep(2) parallelized, and I got the 502 error still.

sckott commented 6 years ago

@arw36 im guessing its a rate limiting issue, and if that's correct, sending requests in parallel won't help and then you'll actually fail sooner than if it wasn't in parallel as their servers are counting requests per API key. see the rate limiting section in the package level man file ?rredlist-package

sckott commented 6 years ago

I got an answer about rate limiting. They won't be making any changes for the current API - so we'll have to cope with what is currently there. I've asked what the limit is as the documentation does not include any info on that. They do suggest a 2 second delay between calls, so that's a good starting point.

maelle commented 6 years ago

@sckott would you be interested in my making a PR to add https://cran.r-project.org/web/packages/ratelimitr/index.html as a dependency and therefore have the rate limiting built-in?

sckott commented 6 years ago

Thanks @maelle , but I think i'd like to let the users tweak this for their use case