unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

[fec] Broken link is not being detected #152

Closed divergentdave closed 9 years ago

divergentdave commented 10 years ago

The first link here points to here, which returns a 302 Found, pointing to here, which is in turn served with 200 OK. Perhaps we should have the fec.py scraper check reports after they have been saved to look for "Page Not Found." More generally, we may need to make errors from the text conversion routines more visible, because we get a CalledProcessError from both pdfinfo and pdftotext with this file.

spulec commented 10 years ago

I'm going to look into this more, but it looks like the proper report can be found here.

spulec commented 10 years ago

I have emailed the IG about the 404 and added a temporary, hacky fix with 7c6892138bd63c9c33b938c5254d577e8c244a6c.

Let's leave this open. If the IG responds within a week or so and fixes it, then we can remove the hack. Otherwise, let's discuss some better possibilities.

divergentdave commented 10 years ago

Here's a list of all the IG websites that don't return 404 response codes when appropriate. The script for this is on my fork at https://github.com/divergentdave/inspectors-general/blob/scripts/check_site_404s.sh.

http://www.cftc.gov/About/Offi... HTTP/1.1 301 Moved Permanently
http://www.cpb.org/oig/doesyou... HTTP/1.1 200 OK
http://www.dodig.mil/doesyour4... HTTP/1.1 302 Redirect
http://www.eac.gov/inspector_g... HTTP/1.1 302 Moved Temporarily
http://www.exim.gov/oig/doesyo... HTTP/1.1 302 Moved Temporarily
http://fcc.gov/oig/doesyour404... HTTP/1.1 301 Moved Permanently
http://www.fec.gov/fecig/doesy... HTTP/1.1 302 Found
http://www.fmc.gov/bureaus_off... HTTP/1.1 302 Object moved
http://www.gpo.gov/oig/doesyou... HTTP/1.1 302 Moved Temporarily
http://gsaig.gov/doesyour404wo... HTTP/1.1 302 Moved Temporarily
http://www.ncua.gov/about/Lead... HTTP/1.1 200 OK
http://www.nrc.gov/doesyour404... HTTP/1.1 302 Moved Temporarily
http://www.peacecorps.gov/abou... HTTP/1.1 301 MOVED PERMANENTLY
http://www.si.edu/OIG/doesyour... HTTP/1.1 200 OK
http://oig.state.gov/doesyour4... HTTP/1.1 302 Moved Temporarily
konklone commented 10 years ago

Oh hey, great idea, @divergentdave, and maybe that's even worth a field on the eventual dashboard.

konklone commented 9 years ago

@spulec I assume the FEC OIG didn't get back to you about the hack?

spulec commented 9 years ago

Correct, no response :/

konklone commented 9 years ago

I guess we'll close it for now then. :/