sc3 / cookcountyjail

A Django app that tracks the population of Cook County Jail over time and summarizes trends.
http://cookcountyjail.recoveredfactory.net/api/1.0/?format=json
Other
31 stars 23 forks source link

v2 scraper intermittenly fails #375

Closed nwinklareth closed 10 years ago

nwinklareth commented 10 years ago

With this error:

Cook County Jail 2.0 API scraper started at Mon May 12 12:09:37 CDT 2014 Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/ubuntu/website/2.0/websites/aad178f5d1/scripts/scraper.py", line 84, in scraper.run() File "/home/ubuntu/website/2.0/websites/aad178f5d1/scripts/scraper.py", line 61, in run booked_left = self._ccj_api.booked_left(next_day) File "scripts/ccj_api_v1.py", line 42, in booked_left assert booked_inmates_response.status_code == 200 AssertionError Cook County Jail 2.0 API scraper finished at Mon May 12 12:09:42 CDT 2014

The first incident was March 5th.

bepetersn commented 10 years ago

That's an assertion statement that is hard coded into the code in v2 that fetches from v1, which fails when it doesn't get a 200 response code.

In the long term, this code makes no sense. We have an http class that handles requests very well, in the main scraper. In the short term, no idea what to do. On May 12, 2014 3:36 PM, "nwinklareth" notifications@github.com wrote:

With this error:

Cook County Jail 2.0 API scraper started at Mon May 12 12:09:37 CDT 2014 Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/ubuntu/website/2.0/websites/aad178f5d1/scripts/scraper.py", line 84, in scraper.run() File "/home/ubuntu/website/2.0/websites/aad178f5d1/scripts/scraper.py", line 61, in run booked_left = self._ccj_api.booked_left(next_day) File "scripts/ccj_api_v1.py", line 42, in booked_left assert booked_inmates_response.status_code == 200 AssertionError Cook County Jail 2.0 API scraper finished at Mon May 12 12:09:42 CDT 2014

The first incident was March 5th.

— Reply to this email directly or view it on GitHubhttps://github.com/sc3/cookcountyjail/issues/375 .

nwinklareth commented 10 years ago

Well how about starting by change the assertion to an if test and spitting out what code it got, and what response if any, so we have a bit more information. I think with that we can start fixing it.

bepetersn commented 10 years ago

I will take this one, if you'd like.

nwinklareth commented 10 years ago

Please do and [lease post what you find out.

On Mon, May 12, 2014 at 4:46 PM, Brian Everett Peterson < notifications@github.com> wrote:

I will take this one, if you'd like.

— Reply to this email directly or view it on GitHubhttps://github.com/sc3/cookcountyjail/issues/375#issuecomment-42892614 .

Regards

Norbert

Norbert Winklareth

bepetersn commented 10 years ago

Well, I just tried to deploy, and the v2.0 branch fabfile has the requirements.txt file in the wrong place.

bepetersn commented 10 years ago

I do think the code I just merged in, #376, will fix this problem with the scraper, though.

bepetersn commented 10 years ago

Well, it didn't crash, just created this log.

Cook County Jail 2.0 API scraper started at Tue May 13 12:15:19 CDT 2014 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-01, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-01&discharge_date_earliest__lte=2014-05-01, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-02, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-02&discharge_date_earliest__lte=2014-05-02, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-03, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-03&discharge_date_earliest__lte=2014-05-03, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-04, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-04&discharge_date_earliest__lte=2014-05-04, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-05, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-05&discharge_date_earliest__lte=2014-05-05, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-06, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-06&discharge_date_earliest__lte=2014-05-06, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-07, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-07&discharge_date_earliest__lte=2014-05-07, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-08, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-08&discharge_date_earliest__lte=2014-05-08, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-09, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-09&discharge_date_earliest__lte=2014-05-09, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-10, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-10&discharge_date_earliest__lte=2014-05-10, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-11, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-11&discharge_date_earliest__lte=2014-05-11, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&booking_date__exact=2014-05-12, got status code 502 failed to fetch http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate?format=json&limit=0&discharge_date_earliest__gte=2014-05-12&discharge_date_earliest__lte=2014-05-12, got status code 502 Scraper ran for 0:00:03.895659 Cook County Jail 2.0 API scraper finished at Tue May 13 12:15:25 CDT 2014

nwinklareth commented 10 years ago

Both scrapers ran this morning and there was no fetch errors reported by the v2 scraper. On 2014-05-16 or 17, the number of days the process looks back in time needs to be reduced back down to 5 and then this issue can be closed.

bepetersn commented 10 years ago

Reverted back to 5 day inmate_window.