unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

USPS scraper is broken #169

Closed divergentdave closed 10 years ago

divergentdave commented 10 years ago

It looks like the USPS website changed, so the scraper can't process the report list anymore. There may be other changes lurking too.

See https://uspsoig.gov/document-library?&field_doc_date_value[value][date]=1998-01-01&field_doc_cat_tid[]=1920&field_doc_cat_tid[]=1923&field_doc_cat_tid[]=1922

konklone commented 10 years ago

One other thing to look for here: the scraper didn't choke and email the admin, which is another bug. If that can be addressed, and the scraper made intentionally more brittle on that front, that'd help.

divergentdave commented 10 years ago

FWIW, I got a Slack DM with the IndexError.

Traceback (most recent call last): File "inspectors\utils\utils.py", line 24, in run run_method(cli_options) File "inspectors\usps.py", line 52, in run max_page = last_page_for(doc) File "inspectors\usps.py", line 132, in last_page_for page = doc.select("li.pager-item.last")[0].text.replace("of ", "").strip() IndexError: list index out of range

slobdell commented 10 years ago

:thumbsup:

slobdell commented 10 years ago

PR at https://github.com/unitedstates/inspectors-general/pull/170

slobdell commented 10 years ago

@konklone It looks like the email notification stuff works fine, it would seem that the lack of a notification would indicate a misconfigured admin.yml

konklone commented 10 years ago

That problem was mine, I had actually disabled my scrapers because of a spike of web traffic to another site hosted on the same box, and forgot to turn them on. -_-

konklone commented 10 years ago

Fixed by #170.