unitedstates / congress

Public domain data collectors for the work of Congress, including legislation, amendments, and votes.
https://github.com/unitedstates/congress/wiki
Creative Commons Zero v1.0 Universal
929 stars 202 forks source link

Fix several run bill issues #280

Closed jerrywithaz closed 3 months ago

jerrywithaz commented 2 years ago
  1. This fixes an issue in the parsing regex for a sponsor full name when their party has more than 1 letter i.e. ID. Changes (?P<party>[DRIL]) to (?P<party>[A-Z]+)
  2. Adds check for sourceSystem
  3. Adds check for sponsors in bill_dict
jerrywithaz commented 2 years ago

@JoshData friendly ping!

JoshData commented 2 years ago

Could you list some bills that cause the errors you're fixing?

jerrywithaz commented 2 years ago

@JoshData I wish I could, there were about 2000 bills that failed during processing because of these errors and so I just fixed the errors so I could keep processing. The only thing I remember was that for the regex error it was a bill by Joseph Lieberman (Independent) and the party was listed as ID which caused it to fail.

For the other other errors, I think they were older ones like in congress 108 or 109, and for some reason they didn't have a sponsor which caused it to fail with 'NoneType' object is not subscriptable

jerrywithaz commented 2 years ago

@JoshData I just recieved the source system error:

[s1793-111] Exception:

Traceback (most recent call last):

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/utils.py", line 174, in process_set results = fetch_func(id, options, *extra_args)

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bills.py", line 104, in process_bill bill_data = form_bill_json_dict(xml_as_dict)

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bills.py", line 155, in form_bill_json_dict actions = bill_info.actions_for(bill_dict['actions']['item'], bill_id, bill_info.current_title_for(titles, 'official'))

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bill_info.py", line 412, in actions_for action_list = [item for item in action_list

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bill_info.py", line 413, in if keep_action(item, closure)]

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bill_info.py", line 397, in keep_action if item['sourceSystem']['code'] == "9":

KeyError: 'sourceSystem'

jerrywithaz commented 2 years ago

also for [s2271-109] and several other bills, but those 2 should suffice

jerrywithaz commented 2 years ago

Also recieved the sponsor error:

[s679-112] Exception:

Traceback (most recent call last):

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/utils.py", line 174, in process_set results = fetch_func(id, options, *extra_args)

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bills.py", line 104, in process_bill bill_data = form_bill_json_dict(xml_as_dict)

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bills.py", line 173, in form_bill_json_dict 'cosponsors': bill_info.cosponsors_for(bill_dict['cosponsors']),

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bill_info.py", line 541, in cosponsors_for cosponsors = [build_dict(cosponsor) for cosponsor in cosponsors_list]

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bill_info.py", line 541, in cosponsors = [build_dict(cosponsor) for cosponsor in cosponsors_list]

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bill_info.py", line 532, in build_dict cosponsor_dict = sponsor_for(item)

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bill_info.py", line 170, in sponsor_for raise ValueError(sponsor_dict)

ValueError: OrderedDict([('bioguideId', 'L000304'), ('fullName', 'Sen. Lieberman, Joseph I. [ID-CT]'), ('firstName', 'JOSEPH'), ('middleName', 'I.'), ('lastName', 'LIEBERMAN'), ('party', 'ID'), ('state', 'CT'), ('identifiers', OrderedDict([('lisID', '1385'), ('bioguideId', 'L000304'), ('gpoId', '8246')])), ('sponsorshipDate', '2011-03-30'), ('isOriginalCosponsor', 'True'), ('sponsorshipWithdrawnDate', None)])

jerrywithaz commented 2 years ago

And the regex error:

You can see the senators party is ID

[s2038-112] Exception:

Traceback (most recent call last):

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/utils.py", line 174, in process_set results = fetch_func(id, options, *extra_args)

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bills.py", line 104, in process_bill bill_data = form_bill_json_dict(xml_as_dict)

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bills.py", line 172, in form_bill_json_dict 'sponsor': bill_info.sponsor_for(bill_dict['sponsors']['item'][0]),

File "/home/democrasee/.local/lib/python3.9/site-packages/congress/tasks/bill_info.py", line 170, in sponsor_for raise ValueError(sponsor_dict)

ValueError: OrderedDict([('bioguideId', 'L000304'), ('fullName', 'Sen. Lieberman, Joseph I. [ID-CT]'), ('firstName', 'JOSEPH'), ('middleName', 'I.'), ('lastName', 'LIEBERMAN'), ('party', 'ID'), ('state', 'CT'), ('identifiers', OrderedDict([('lisID', '1385'), ('bioguideId', 'L000304'), ('gpoId', '8246')])), ('byRequestType', None)])

JoshData commented 2 years ago

Thanks that's helpful.

GitHub indicates that there are conflicts now. Could you try resolving that? Happy to merge after.