ropensci / fishbaseapi

Fishbase API
https://fishbaseapi.readme.io/
MIT License
42 stars 12 forks source link

invalid content-length header for api requests with limit > 1000 #68

Closed jhpoelen closed 3 months ago

jhpoelen commented 9 years ago

hey y'all:

I've been poking around the fishbase api . . . thanks for building it!

I am observing a strange content length issue that occurs for larger requests (I am interested in getting all data).

To reproduce: wget http://fishbase.ropensci.org/fooditems?limit=1000&offset=1600

Expected result: Successful download of first 1000 records in fooditems table.

Actual result: For some reason the content-length is set too high, 'causing wget (and any other http client that checks the content-length) to thing the connection is closed unexpectedly (see log trace below).

Note that this issue does not occur when manually downloading the data using a browser, even though the content-length is off. Also, the issue does not seem to occur when smaller batches are used (e.g. limit=100).

Hope this helps, -jorrit

wget "http://fishbase.ropensci.org/fooditems?limit=1000&offset=1600"
--2015-09-15 08:39:34--  http://fishbase.ropensci.org/fooditems?limit=1000&offset=1600
Resolving fishbase.ropensci.org... 198.199.104.22
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 895255 (874K) [application/json]
Saving to: `fooditems?limit=1000&offset=1600'

 7% [=>                                     ] 65,793       418K/s   in 0.2s    

2015-09-15 08:39:34 (418 KB/s) - Connection closed at byte 65793. Retrying.

--2015-09-15 08:39:35--  (try: 2)  http://fishbase.ropensci.org/fooditems?limit=1000&offset=1600
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 895255 (874K) [application/json]
Saving to: `fooditems?limit=1000&offset=1600'

 7% [=>                                     ] 65,793       401K/s   in 0.2s   
sckott commented 9 years ago

Thanks much. Haven't thought about this yet, will have a look and hopefully fix soon

sckott commented 9 years ago

woops, accidentally closed

cboettig commented 9 years ago

Hmm, I can't reproduce the error using your example; perhaps there is some interaction with network connection speed as well? Still, I think you're correct that it's a bug. @sckott can you reproduce? (I just ran the wget command on from the ubuntu-based DO box, which obviously has a good network speed).

@jhpoelen is the offset part of the query relevant to getting this error, or do you get the same error without that? I'm wondering if the API is sending the wrong information about content-length when using offsets perhaps?

sckott commented 9 years ago

@cboettig I could reproduce it earlier today, but after your server fix, i can't reproduce it, so I think it may be fixed

carl, we may want to roll back this commit https://github.com/ropensci/fishbaseapi/commit/131a6b07940ff6cda42dc9a0c1c10772e63b08bd if this is sorted

cboettig commented 9 years ago

@sckott yeah, I'm not sure if the server fix actually fixed it, or if it's just a temperamental network issue. CRAN's test machines were getting some 'incomplete file transfer' errors too on a few tests I forgot to skip_on_cran. let's see if @jhpoelen can reproduce before we roll back.

jhpoelen commented 9 years ago

thanks for the quick reply. I was not able to reproduce the issue just now. Also, close inspection in the firefox "network" develop window now indicates that the content-length is the same as bytes transferred. Whatever changed . . . now the api works like a charm! Closing issue.

jhpoelen commented 9 years ago

PS I doubt this was a network issue - I was able to get the results ok, the content-length header simply didn't reflect the bytes transferred, causing the httpclients to freak out.

sckott commented 9 years ago

@jhpoelen we ended up having a problem on the server where our disk was full. That somehow caused larger requests to not be handled correctly.

jhpoelen commented 9 years ago

Unfortunately, it seems that the issue is still occurring.

To reproduce:

wget "http://fishbase.ropensci.org/species?limit=5000&offset=0"
--2015-09-16 11:14:01--  http://fishbase.ropensci.org/species?limit=5000&offset=0
Resolving fishbase.ropensci.org... 198.199.104.22
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14377534 (14M) [application/json]
Saving to: `species?limit=5000&offset=0'

45% [===============================>                                       ] 6,576,816    394K/s   in 27s     

2015-09-16 11:14:37 (235 KB/s) - Connection closed at byte 6576816. Retrying.

--2015-09-16 11:14:38--  (try: 2)  http://fishbase.ropensci.org/species?limit=5000&offset=0
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14377534 (14M) [application/json]
Saving to: `species?limit=5000&offset=0'

 9% [======>                                                                ] 1,428,432    382K/s  eta 39s 
sckott commented 9 years ago

@jhpoelen works for me, 3 times in a row, works fine with curl too

is it repeatedly failing for you?

jhpoelen commented 9 years ago

Weird!

I tried again, and this time the issue occurred a bit later:

wget "http://fishbase.ropensci.org/species?limit=5000&offset=0"
--2015-09-16 11:25:11--  http://fishbase.ropensci.org/species?limit=5000&offset=0
Resolving fishbase.ropensci.org... 198.199.104.22
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14377534 (14M) [application/json]
Saving to: `species?limit=5000&offset=0.1'

73% [===================================================>                   ] 10,567,504   171K/s   in 48s     

2015-09-16 11:26:02 (213 KB/s) - Connection closed at byte 10567504. Retrying.

--2015-09-16 11:26:03--  (try: 2)  http://fishbase.ropensci.org/species?limit=5000&offset=0
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14377534 (14M) [application/json]
Saving to: `species?limit=5000&offset=0.1'

Could it be that the server is timing out long lasting downloads?

jhpoelen commented 9 years ago

It is failing pretty consistently on my end . . . here's more logs:

--2015-09-16 11:26:03--  (try: 2)  http://fishbase.ropensci.org/species?limit=5000&offset=0
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14377534 (14M) [application/json]
Saving to: `species?limit=5000&offset=0.1'

74% [====================================================>                  ] 10,748,504   216K/s   in 50s     

2015-09-16 11:26:55 (209 KB/s) - Connection closed at byte 10748504. Retrying.

--2015-09-16 11:26:57--  (try: 3)  http://fishbase.ropensci.org/species?limit=5000&offset=0
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14377534 (14M) [application/json]
Saving to: `species?limit=5000&offset=0.1'

73% [===================================================>                   ] 10,631,216   173K/s   in 51s     

2015-09-16 11:27:50 (202 KB/s) - Connection closed at byte 10748504. Retrying.

--2015-09-16 11:27:53--  (try: 4)  http://fishbase.ropensci.org/species?limit=5000&offset=0
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14377534 (14M) [application/json]
Saving to: `species?limit=5000&offset=0.1'

69% [================================================>                      ] 10,034,640   257K/s   in 47s     

2015-09-16 11:28:41 (208 KB/s) - Connection closed at byte 10748504. Retrying.

--2015-09-16 11:28:45--  (try: 5)  http://fishbase.ropensci.org/species?limit=5000&offset=0
Connecting to fishbase.ropensci.org|198.199.104.22|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14377534 (14M) [application/json]
Saving to: `species?limit=5000&offset=0.1'

73% [===================================================>                   ] 10,638,456   244K/s   in 51s     

2015-09-16 11:29:38 (203 KB/s) - Connection closed at byte 10748504. Retrying.
sckott commented 9 years ago

can you do the same with curl -v verbose so I can see the headers returned

sckott commented 9 years ago

@jhpoelen what's your connection speed like where you are?

jhpoelen commented 9 years ago

My (home) connection has ~ 3.3 Mbps according to speednet.net .

For some reason, now the issue no longer occurs. ; ( Both curl -v and wget succeeded fro species?limit=5000&limit=0 and species?limit=5000&limit=1 .

Here's the verbose curl output for successful run:

 curl -v "http://fishbase.ropensci.org/species?limit=5000&offset=0" > ~/Desktop/result.json
* Adding handle: conn: 0x7fda61803a00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fda61803a00) send_pipe: 1, recv_pipe: 0
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* About to connect() to fishbase.ropensci.org port 80 (#0)
*   Trying 198.199.104.22...
* Connected to fishbase.ropensci.org (198.199.104.22) port 80 (#0)
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0> GET /species?limit=5000&offset=0 HTTP/1.1
> User-Agent: curl/7.30.0
> Host: fishbase.ropensci.org
> Accept: */*
> 
  0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0< HTTP/1.1 200 OK
* Server nginx/1.9.4 is not blacklisted
< Server: nginx/1.9.4
< Date: Wed, 16 Sep 2015 18:48:34 GMT
< Content-Type: application/json; charset=utf8
< Content-Length: 14377534
< Connection: close
< Status: 200 OK
< Access-Control-Allow-Methods: HEAD, GET
< Access-Control-Allow-Origin: *
< Cache-Control: public, must-revalidate, max-age=60
< X-Content-Type-Options: nosniff
< 
{ [data not shown]
100 13.7M  100 13.7M    0     0   160k      0  0:01:27  0:01:27 --:--:--  392k
* Closing connection 0
jhpoelen commented 9 years ago

Perhaps the nginx caching directory that sporadically running out of disk space?

sckott commented 9 years ago

Thanks for that.

Perhaps the nginx caching directory that sporadically running out of disk space?

perhaps, we'll look into it.