Closed spulec closed 10 years ago
As usual - thank you for tackling this. The VA IG is a great get right now.
I ran ./inspectors/va.py --dry_run --debug --since=1996
and got this:
"GET /oig/publications/report-summary.asp?id=3132 HTTP/1.1" 200 15001
Traceback (most recent call last):
File "/home/eric/unitedstates/inspectors-general/inspectors/utils/utils.py", line 23, in run
run_method(cli_options)
File "./inspectors/va.py", line 60, in run
report = report_from(result, year_range)
File "./inspectors/va.py", line 90, in report_from
report_id = field_mapping['Report Number']
KeyError: 'Report Number'
I have no idea why - I think the offending landing page is the URL it had just fetched, and that page seems to have the right HTML, as does another random one I picked.
(I merged the master
branch in to get it up to date, you'll want to git pull
.)
So I just ran that exact command (after a git pull) and it passes on my machine. My first thoughts are either inconsistent responses from the VA or that something is different about my machine.
Here is the relevant portion of a pip freeze:
beautifulsoup4==4.3.2
cssselect==0.9.1
lxml==3.3.5
pyquery==1.2.8
requests==2.3.0
scrapelib==0.9.1
If you don't mind posting yours, I can setup my environment to be exactly like it and try to debug more. I'll also try to get this running on my server and see if that passes or fails.
You were right, it looks like it was my error. Not sure what error condition was producing invalid HTML to the parser, but I'm fine merging it in. Thanks as usual.