unitedstates / congress-legislators

Members of the United States Congress, 1789-Present, in YAML/JSON/CSV, as well as committees, presidents, and vice presidents.
Creative Commons Zero v1.0 Universal
2.08k stars 507 forks source link

CSV for all served terms (i.e. one row per (legislator, term served)) #662

Open nrjones8 opened 5 years ago

nrjones8 commented 5 years ago

First of all, thanks so much for making and maintaining this data!

I was wondering if you all would be open to adding another csv that would be generated based on the existing legislators-current.yaml and legislators-historical.yaml files. Right now, those files contain more detailed term information than the CSV versions do - which makes sense since the number of terms served by any given legislator can vary, making it hard to put that data into the existing one-row-per-legislator CSV.

I'm wondering if you'd be open to creating a new csv that includes term start/end information for every term served by every legislator? It would look something like this:

bioguide_id,office_type,congress_number,start_date,end_date
B000226,sen,1,1789-03-04,1793-03-03
B000546,rep,1,1789-03-04,1791-03-03
B001086,rep,1,1789-03-04,1791-03-03
C000187,rep,1,1789-03-04,1791-03-03
...
V000119,rep,76,1939-01-03,1941-01-03
V000119,rep,77,1941-01-03,1943-01-03
...

so the same legislator can appear multiple times (one time per term served). This would allow people to more easily do analyses on all of the members for a given Congress number (i.e. grab all the bioguide_ids for a given Congress number, then join those IDs to the legislators-historical.csv file.

I created a quick prototype (not ready for review, wanted to see if you all were open to the idea first) to give an idea of what I mean: https://github.com/nrjones8/congress-legislators/commit/0e8338933924c5b1c61f49de5a39e394d28f1e9d

Thanks in advance!

JoshData commented 5 years ago

I think we'd love to add such a file. It may be difficult to assign congress numbers accurately to the whole dataset however: see #185. That might be better to address as a nice-to-have later. We should also attempt to include every term field in the output and use the same field names as much as possible. So with those caveats I'm :+1: .

nrjones8 commented 5 years ago

ah yes, I figured there was a reason that congress numbers hadn't been added before! Thanks for that context.

Makes sense on including every term field - just to clarify though, you mean including all subfields under a term object from the source YAML, like this one?

- type: rep
    start: '2013-01-03'
    end: '2015-01-03'
    state: NV
    party: Democrat
    district: 4
    office: 1330 Longworth House Office Building
    address: 1330 Longworth HOB; Washington DC 20515-2804
    phone: 202-225-9894
    url: http://horsford.house.gov
    rss_url: http://horsford.house.gov/rss.xml
    contact_form: https://horsford.house.gov/contact/email-me

Only downside I can see for that is that each term object doesn't have a fixed set of fields. Especially for past legislators, just means they'd have a bunch of missing data (e.g. contact_form, url, address, phone etc.) in the resulting CSV.

caleblucas commented 3 years ago

@JoshData thanks for your work on this great project! does something like what @nrjones8 detailed exist that you know of now?

JoshData commented 3 years ago

I don't think anyone yet has stepped up to create the file, no.