unitedstates / congress-legislators

Members of the United States Congress, 1789-Present, in YAML/JSON/CSV, as well as committees, presidents, and vice presidents.
Creative Commons Zero v1.0 Universal
2.08k stars 507 forks source link

Historical Committee assignments #46

Open alexanderfurnas opened 11 years ago

alexanderfurnas commented 11 years ago

Great work here, such an excellent source. I was curious about the possibility of keeping historical committee assignments for legislators from their previous terms. As I understand only current committee assignments are housed here. Anyone have thoughts on this?

dwillis commented 11 years ago

It's on my list - I have assignments from the 105th congress onward in the NYT data, but only 111th-present are in the API and vetted. But these should be coming.

konklone commented 11 years ago

That's great - Derek, if you do that, and wouldn't mind updating this thread, I'd be happy to do the legwork of importing them into our data here.

On Thu, Mar 14, 2013 at 12:16 PM, Derek Willis notifications@github.comwrote:

It's on my list - I have assignments from the 105th congress onward in the NYT data, but only 111th-present are in the API and vetted. But these should be coming.

— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congress-legislators/issues/46#issuecomment-14912061 .

Developer | sunlightfoundation.com

alexanderfurnas commented 11 years ago

Fantastic. Thanks for the response Derek.

schmod commented 11 years ago

The Senate Calendar includes a listing of committee assignments, and is available from fdsys as far back as 1996 (although it's PDF only prior to the 105th Congress).

http://www.gpo.gov/fdsys/browse/collection.action?collectionCode=CCAL&browsePath=107%2FSCAL%2F2002-11%2F11-20%5C%2F4%3BFINAL&isCollapsed=false&leafLevelBrowse=false&isDocumentResults=true&ycord=0

dwillis commented 11 years ago

Yep, a good resource, although those are the "final" rosters and don't reflect changes made during the course of each congress, which ideally we'd like to have.

schmod commented 11 years ago

Aha, got it.

schmod commented 11 years ago

Hm. You could step through hearing reports on FDSys, which all have the supposedly-then-current committee membership attached to them.

Parsing actually might be fairly easy (as far as these things go), as the GPO put the committee membership in the XML metadata for each document.

jasonab commented 11 years ago

Just wanted to check on this issue, with Ed Markey moving from the house to the senate today. It would be nice for the data to reflect that in his commitee memberships. For my needs, I don't care about past data so much as current changes, and maybe the 112th congress. It'd be nice to get at least that much.

JoshData commented 11 years ago

We update current committee assignments using the committee_membership.py script. I've just run it, see f15f12d.

bchartoff commented 11 years ago

There's a wealth of historical committee membership data here: http://web.mit.edu/17.251/www/data_page.html#2%29

Pros:

Cons:

Given that the current-committees data has higher granularity (sub committees), is it worth scraping and preserving this data for historical committee membership?

JoshData commented 11 years ago

Have to be a little careful with that data. Some is listed as for academic use only.

schmod commented 11 years ago

If anybody wants to brute-force this, Robert Byrd compiled one of the more comprehensive listings of old committees (and their chairpeople, but not members) that I've seen. The Senate historian seems to be keeping the list up to date.

Full membership information is available in the congressional directory, which has been published continuously since 1820. Scanned copies should be available from archive.org. Good luck getting that data into a structured format though...

Charles Stewart's data from the 1st-79th congresses does not have the academic-only disclaimer, but he does request a citation. (If you want his data served to you on a dead tree, you can apparently also buy the thing as a 4,000 page printed volume). I'm pretty sure that CQ also has a fairly comprehensive database of this information, locked away somewhere.

schmod commented 11 years ago

Oh, and the Wikipedians have compiled a good listing of resources for researching historical committee information....

konklone commented 11 years ago

If a link in our README would suffice as a citation, I don't have a problem with that.

On Tue, Aug 27, 2013 at 10:49 AM, schmod notifications@github.com wrote:

If anybody wants to brute-force this, Robert Byrd compiledhttp://books.google.com/books?id=PeHByMYxVm8C&printsec=frontcover&dq=isbn:0160632560&hl=en&sa=X&ei=xq4cUu_EEqi9sASwz4GQDA&ved=0CC8Q6AEwAA#v=onepage&q&f=falseone of the more comprehensive listings of old committees (and their chairpeople, but not members) that I've seen. The Senate historian seems to be keeping the list up to datehttp://www.senate.gov/artandhistory/history/resources/pdf/CommitteeChairs.pdf .

Full membership information is available in the congressional directory, which has been published continuously since 1820. Scanned copies should be available from archive.org. Good luck getting that data into a structured format though...

Charles Stewart's data from the 1st-79th congresses does not have the academic-only disclaimer, but he does request a citation. (If you want his data served to you on a dead tree, you can apparently also buy the thing as a 4,000 page printed volumehttp://books.google.com/books?id=J4JPMQEACAAJ&dq=isbn:1568021712&hl=en&sa=X&ei=UrAcUpCdI_Si4AP7u4B4&ved=0CDgQ6AEwAg). I'm pretty sure that CQ also has a fairly comprehensive database of this information, locked away somewhere.

— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congress-legislators/issues/46#issuecomment-23342377 .

Developer | sunlightfoundation.com

bchartoff commented 11 years ago

I'm w/ @konklone on README citation. I've also had zero luck getting CQ data in the past, they hold onto it pretty tight.

schmod commented 11 years ago

Maybe somebody should send Charles Stewart an email as a courtesy?

konklone commented 11 years ago

Agreed. And we should invite him to join Github and help us out!

wilson428 commented 11 years ago

I can take a stab at revisiting this. Senate calendar from FDSys still seem like a good place to start? @schmod, where is the XML metadata that has the content of the committee memberships that you referenced a few months ago? Can't locate it just poking around.

Also getting lots of dead links for commands like this:

fdsys --year=2009 --store=text,xml --collections=CCAL

e.g.

Downloading: data/fdsys/CCAL/2009/CCAL-111scal-2009-10-30/document.xml
file not found: http://www.gpo.gov/fdsys/pkg/CCAL-111scal-2009-10-30/xml/CCAL-111scal-2009-10-30.xml

Most of GPO site seems to be active. Any ideas?

konklone commented 11 years ago

I believe GPO's FDSys is only open for certain high priority collections: https://twitter.com/USGPO/status/384993220536455168

davidmooreppf commented 6 years ago

Just popping in to say I'm finding this thread helpful in our latest Cong. research, thanks all.

If anyone has a lead on historical member data with subcommittee affiliations, it would be of interest to us, but parent committees are a good start.

Also, I see this has been a recent request again, in issue #522 - maybe this is an area of wider interest for re-use.

JoshData commented 6 years ago

The Congressional Directory was mentioned earlier, but I was looking it over so I thought I'd post more information:

I started writing some code before deciding parsing the plain text would be too hard to get done any time soon, but here's some code to pull down the text files:

import json
import urllib.request

def walk_directory(url):
    print(url + "...")
    directory = json.loads(urllib.request.urlopen(url + "?fetchChildrenOnly=1").read().decode("utf8"))
    for node in directory["childNodes"]:
        if node["nodeValue"]["level"] == 3 and node["nodeValue"].get("displayValue", "") != "Committee Assignments":
            # Skip nodes that don't have committee assignments within them.
            pass
        elif "value" in node["nodeValue"]:
            # Recursively go into this node.
            walk_directory(url + "/" + node["nodeValue"]["value"])
        elif re.match("ASSIGNMENTS OF (SENATORS|REPRESENTATIVES) TO COMMITTEES", node["nodeValue"].get("title", "")):
            # This holds committee assignments!
            parse_committee_assignments(node["nodeValue"]["packageid"], node["nodeValue"]["textfile"])

walk_directory("https://www.govinfo.gov/wssearch/rb/cdir")