unitedstates / congressional-record

A parser for the Congressional Record.
Other
128 stars 40 forks source link

Resolve speaker name from congress-legislators #12

Closed AlJohri closed 9 years ago

AlJohri commented 10 years ago

I'd like to resolve metadata about a speaker from https://github.com/unitedstates/congress-legislators and place it within the parsed XML CrDoc.

This would be similar to the db_bioguide_lookup from the CapitolWords Solr ingestor (https://github.com/sunlightlabs/Capitol-Words/blob/master/solr/lib.py#L151) however it would simply check the YAML file instead of the bioguide and NYT APIs.

Similar to the CapitolWords method get_speaker_metadata (https://github.com/sunlightlabs/Capitol-Words/blob/master/solr/ingest.py#L218) it would strip the "Mr/Ms/Mrs" at the beginning of the speaker's title and find a legislator that matched the same last name and had a term that matched the same year as the speech.

Lastly, I also wanted to use the congress-legislators repository to resolve speaker's who name resolves to "special titles" such as "speaker pro tempore", "vice president", "president", "recorder", etc.

If the title represents a person, it would resolve the correct legislator given the correct year; if the title represents something like recorder, it would set a field as thus.

I was thinking of just adding an option such as: --resolve-legislators to perform this parsing.

Would you be interested in such a PR?

CC: @drinks

drinks commented 10 years ago

Hey sorry for being absent on stuff lately, I just wanted to drop in and say I think this is an awesome idea, and would love a pr. I'm interested in hearing more about special title resolution; this is something I took a crack at almost 2 years ago now, and though I actually had success with it, that branch of code is now collecting dust on my harddrive. It's been my impression that on days when someone serves as speaker pro tem or similar they're usually not saying anything in their own capacity as a legislator, but if you think it will yield good stuff I'm all for it.