Closed drinks closed 10 years ago
I tried adding an extra line when necessary locally and that fixed the chamber and page identification problem but I got the same list out of range problem on 2 Senate docs-
CREC-2014-01-21-pt1-PgS463-2.txt
CREC-2014-01-21-pt1-PgS463-3.txt
Dan, did adding a line solve this problem for you or is this a different problem?
Lindsay Young 202-742-1520 x243 Sunlight Foundation http://www.sunlightfoundation.com/
On Fri, Jan 24, 2014 at 2:12 PM, Dan Drinkard notifications@github.comwrote:
Super brittle!
— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congressional-record/issues/4 .
Can you post your log output? I saw all files parse correctly after making that adjustment.
LindsayYoung wrote:
I tried adding an extra line when necessary locally and that fixed the chamber and page identification problem but I got the same list out of range problem on 2 Senate docs-
CREC-2014-01-21-pt1-PgS463-2.txt
CREC-2014-01-21-pt1-PgS463-3.txt
Dan, did adding a line solve this problem for you or is this a different problem?
Lindsay Young 202-742-1520 x243 Sunlight Foundation http://www.sunlightfoundation.com/
On Fri, Jan 24, 2014 at 2:12 PM, Dan Drinkard notifications@github.comwrote:
Super brittle!
— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congressional-record/issues/4 .
— Reply to this email directly or view it on GitHub https://github.com/unitedstates/congressional-record/issues/4#issuecomment-33264486.
It was more docs than I thought. It was the same amount of files not getting parsed. The log is pasted below. But the files that do go through, are parsed correctly this time unlike before the space was added.
$ python2.7 parser.py -id ../crtest
flag status: False
Error processing file: ../crtest/CREC-2014-01-21-pt1-PgE109-2.txt: list index out of range
flag status: False
Error processing file: ../crtest/CREC-2014-01-21-pt1-PgE109-3.txt: list index out of range
flag status: False
Error processing file: ../crtest/CREC-2014-01-21-pt1-PgE109-4.txt: list index out of range
flag status: False
saved file /Users/lindsayyoung/Dropbox/Projects/crtest/__parsed/CREC-2014-01-21-pt1-PgH1247-10.xml to disk
flag status: False
saved file /Users/lindsayyoung/Dropbox/Projects/crtest/__parsed/CREC-2014-01-21-pt1-PgH1247-2.xml to disk
flag status: False
saved file /Users/lindsayyoung/Dropbox/Projects/crtest/__parsed/CREC-2014-01-21-pt1-PgH1247-3.xml to disk
flag status: False
Error processing file: ../crtest/CREC-2014-01-21-pt1-PgH1247-4.txt: list index out of range
flag status: False
saved file /Users/lindsayyoung/Dropbox/Projects/crtest/__parsed/CREC-2014-01-21-pt1-PgH1247-5.xml to disk
flag status: False
Error processing file: ../crtest/CREC-2014-01-21-pt1-PgH1247-6.txt: list index out of range
flag status: False
no match-- orphaned
Orphaned Tags:
('', 5, 16, 13) print
saved file /Users/lindsayyoung/Dropbox/Projects/crtest/__parsed/CREC-2014-01-21-pt1-PgH1247-7.xml to disk
flag status: False
saved file /Users/lindsayyoung/Dropbox/Projects/crtest/__parsed/CREC-2014-01-21-pt1-PgH1247-8.xml to disk
flag status: False
Error processing file: ../crtest/CREC-2014-01-21-pt1-PgH1247-9.txt: list index out of range
flag status: False
saved file /Users/lindsayyoung/Dropbox/Projects/crtest/__parsed/CREC-2014-01-21-pt1-PgH1247.xml to disk
flag status: False
saved file /Users/lindsayyoung/Dropbox/Projects/crtest/__parsed/CREC-2014-01-21-pt1-PgH1249-2.xml to disk
UNRECOGNIZED STATE (but that's ok): To the Senate:
flag status: False
Error processing file: ../crtest/CREC-2014-01-21-pt1-PgS463-2.txt: list index out of range
flag status: False
Error processing file: ../crtest/CREC-2014-01-21-pt1-PgS463-3.txt: list index out of range
flag status: False
saved file /Users/lindsayyoung/Dropbox/Projects/crtest/__parsed/CREC-2014-01-21-pt1-PgS463.xml to disk
Sorted offline, source data was the culprit.
Super brittle!