unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Add Department of Agriculture #75

Closed spulec closed 10 years ago

spulec commented 10 years ago

Nothing too interesting here.

Audits, Testimonies, and Semianuual reports working.

konklone commented 10 years ago

Well I think the USDA is very interesting! This scraper seems pretty straightforward. The one thing I'd request is adding support for their Investigation Bulletins, like this one.

They're not individual "reports", really, but they're very high value, have the date extractable from the URL, and whatever values need to be made up to make them fit seem worth it to me. They make terrific FOIA leads, too. Is it easy enough to add them in?

konklone commented 10 years ago

Also, I just notice this report has a relative URL, while many others don't:

{
  "agency": "aphis",
  "agency_name": "Animal Plant Health Inspection Service",
  "file_type": "pdf",
  "inspector": "agriculture",
  "inspector_url": "http://www.usda.gov/oig/",
  "published_on": "2007-10-26",
  "report_id": "33601-0009-CH_Redacted",
  "title": "Controls Over Permits to Import Agricultural Products (PDF)",
  "type": "report",
  "url": "webdocs/33601-0009-CH_Redacted.pdf",
  "year": 2007
}

This one too:

{
  "agency": "rbeg",
  "agency_name": "Rural Business Enterprise Grant",
  "file_type": "pdf",
  "inspector": "agriculture",
  "inspector_url": "http://www.usda.gov/oig/",
  "published_on": "2013-02-14",
  "report_id": "34703-0001-31",
  "title": "The Recovery Act - Rural Development's Rural Business Enterprise Grants Field Confirmations (PDF),",
  "type": "report",
  "url": "webdocs/34703-0001-31.pdf",
  "year": 2013
}

Also, the url field should obviously be checked in the validator to make sure it starts with http:// or https://. I'll add that promptly.

konklone commented 10 years ago

OK, I added the validation in #77, and merged the fix into this branch.

spulec commented 10 years ago

Both issues addressed with the two most recent commits.

konklone commented 10 years ago

A+, thank you!

konklone commented 10 years ago

Interesting - there were 3 404s overnight:

All of them resolve fine if the .PDF at the end is turned into .pdf. I'm not sure a blind .lower() is appropriate, though. Happy to resolve this when I get a sec.

spulec commented 10 years ago

Unfortunately, that seems to break some reports: http://www.usda.gov/oig/webdocs/04601-13-FM.PDF and http://www.usda.gov/oig/webdocs/FINALRPT.PDF.

There are about 20 reports that need to have PDF -> pdf. The most recent one is 1997 so I don't think this will necessarily be a problem going forward. Think we should just hardcode the report ids?

konklone commented 10 years ago

Yeah, if the most recent is 1997, and it's ~20, then I guess hardcoding is the best solution, ugly as it is. :/

spulec commented 10 years ago

Added with https://github.com/unitedstates/inspectors-general/commit/9d2c2ff403bc01f4c1173fb6ec151d4df4722fbb.

We could use a list comprehension to create that constant, but it felt a bit wrong for some reason. Feel free to change if you think that will be cleaner.

konklone commented 10 years ago

:+1: nah, looks great to me, thank you.