unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Add National Science Foundation #118

Closed spulec closed 10 years ago

spulec commented 10 years ago

Currently getting a few 404s:

http://www.nsf.gov/oig/search/m00090035.pdf http://www.nsf.gov/oig/search/m01010002.pdf http://www.nsf.gov/oig/search/m01020006.pdf http://www.nsf.gov/oig/search/m97100037.pdf

If you change the m to M, they resolve. I'm going to reach out to them to see if they can fix the links. If they don't respond, I can add a fix to this.

I'm also going to see if I can get them to add a published date for the one report in REPORT_PUBLISHED_MAP.

konklone commented 10 years ago

Hmm, re: your note:

# Notes for IG's web team:
# - https://www.nsf.gov/oig/search/ encounters an error when using https.
# while it works if just using http.

I get the same application error for both URLs:

Unless it only happens with a POST? Not totally sure how that URL works. Is it possible that it's not related to the protocol, but to your current HTTP session? Each protocol would use a different session/cookies, so is it possible that you had valid session data for one protocol, but not the other?

konklone commented 10 years ago

Anyway, not a blocker, since it works and all! Thanks for doing this, @spulec.

scientists

spulec commented 10 years ago

I think it might just be the POST, but I'm not positive.

CASE_REPORTS_DATA = {
  'sortby': 'rpt_num',
  'sballfrm': 'Search',
}
import requests

http_response = requests.post(
  'http://www.nsf.gov/oig/search/results.cfm',
  data=CASE_REPORTS_DATA,
)

https_response = requests.post(
  'https://www.nsf.gov/oig/search/results.cfm',
  data=CASE_REPORTS_DATA,
)

Both return status 200, but https_response.content shows an error while http_response.content does not. It's possible that adding some additional headers would get the https one to work.

konklone commented 10 years ago

Thanks for investigating further -- doesn't seem worth more work, anyway, it works.