unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Add Department of Interior #65

Closed spulec closed 10 years ago

spulec commented 10 years ago

This was pretty straightforward (I wish they were all this easy).

konklone commented 10 years ago

This is so....relaxing. Thanks, @spulec!

I added a tiny commit chopping the index.cfm off the main inspector_url, as it will properly redirect. Seems slightly more future-proof, though forces redirects on people.

konklone commented 10 years ago

A bunch of 404s when downloading the full Interior archive, it's Interior's fault, these are links that appear:

Error downloading http://www.doi.gov/oig/reports/upload/Report-of-Investigation---Pensus-Public.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/Report-of-Investigation---Pensus-Public.pdf

Error downloading http://www.doi.gov/oig/reports/upload/USBR-Exclusive-Use---Public.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/USBR-Exclusive-Use---Public.pdf

Error downloading http://www.doi.gov/oig/reports/upload/WR-VS-GSV-0008-2010-dtd-9-22-10-Verf-of-recs-1,-2,-&-3-from-ER-EV-GSV-0002-2009.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/WR-VS-GSV-0008-2010-dtd-9-22-10-Verf-of-recs-1,-2,-&-3-from-ER-EV-GSV-0002-2009.pdf

Error downloading http://www.doi.gov/oig/reports/upload/Lake-Jackson---Helicopter_508.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/Lake-Jackson---Helicopter_508.pdf

Error downloading http://www.doi.gov/oig/reports/upload/WR-VS-BOR-0010-2010-dtd-9.3.10-Verf-Rev-of-2-recs-from-99-I-133.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/WR-VS-BOR-0010-2010-dtd-9.3.10-Verf-Rev-of-2-recs-from-99-I-133.pdf

Error downloading http://www.doi.gov/oig/reports/upload/WR-VS-MOA-0009-2010-dtd-8.23.10-Verf-Rev-of-6-Recs-from-Y-EV-MOA-0001-2008.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/WR-VS-MOA-0009-2010-dtd-8.23.10-Verf-Rev-of-6-Recs-from-Y-EV-MOA-0001-2008.pdf

Error downloading http://www.doi.gov/oig/reports/upload/ROO-ROA-MOA-1018-2010.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/ROO-ROA-MOA-1018-2010.pdf

Error downloading http://www.doi.gov/oig/reports/upload/FY-2009-Fisma-Report---Revised.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/FY-2009-Fisma-Report---Revised.pdf

Error downloading http://www.doi.gov/oig/reports/upload/Semi-Fin-11.2.09.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/Semi-Fin-11.2.09.pdf

Error downloading http://www.doi.gov/oig/reports/upload/2008-CD&L-Investigative-Report-REDACTED-with-transmittal.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/2008-CD&L-Investigative-Report-REDACTED-with-transmittal.pdf

Error downloading http://www.doi.gov/oig/reports/upload/ManagementAdvisory(post-CDL)edited07-02-08_cd2.pdf:

Traceback (most recent call last):

  File "/home/unitedstates/inspectors-general/inspectors/utils/utils.py", line 83, in download
    response = scraper.urlopen(url)

  File "/home/unitedstates/.virtualenvs/inspectors/lib/python3.4/site-packages/scrapelib/__init__.py", line 393, in urlopen
    raise HTTPError(resp)

scrapelib.HTTPError: 404 while retrieving http://www.doi.gov/oig/reports/upload/ManagementAdvisory(post-CDL)edited07-02-08_cd2.pdf

I guess I need to contact the IG about it. Annoying.

spulec commented 10 years ago

Good news! Although the links are wrong, they are wrong in a consistent way. It appears to have to do with their slugification with spaces around hyphens.

Listed url: http://www.doi.gov/oig/reports/upload/Report-of-Investigation---Pensus-Public.pdf Correct url: http://www.doi.gov/oig/reports/upload/Report-of-Investigation-Pensus-Public.pdf

I added a simple string replace, that I'm hoping won't negatively impact other reports.

See https://github.com/unitedstates/inspectors-general/commit/d008e2fb14234bc72588f16c0f93362575f8f005

konklone commented 10 years ago

Nice catch! But it doesn't catch all of them, here's the condensed list of 10 404s from above -- not all use triple dashes:

I just sent a note to the IG about these, and linking to this thread. Hopefully they can fix them on their end.

spulec commented 10 years ago

Ah, that's frustrating. Let me know if you get a response so we can revert my most recent commit.