Open alexanderfurnas opened 11 years ago
It's on my list - I have assignments from the 105th congress onward in the NYT data, but only 111th-present are in the API and vetted. But these should be coming.
That's great - Derek, if you do that, and wouldn't mind updating this thread, I'd be happy to do the legwork of importing them into our data here.
On Thu, Mar 14, 2013 at 12:16 PM, Derek Willis notifications@github.comwrote:
It's on my list - I have assignments from the 105th congress onward in the NYT data, but only 111th-present are in the API and vetted. But these should be coming.
— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congress-legislators/issues/46#issuecomment-14912061 .
Developer | sunlightfoundation.com
Fantastic. Thanks for the response Derek.
The Senate Calendar includes a listing of committee assignments, and is available from fdsys as far back as 1996 (although it's PDF only prior to the 105th Congress).
Yep, a good resource, although those are the "final" rosters and don't reflect changes made during the course of each congress, which ideally we'd like to have.
Aha, got it.
Hm. You could step through hearing reports on FDSys, which all have the supposedly-then-current committee membership attached to them.
Parsing actually might be fairly easy (as far as these things go), as the GPO put the committee membership in the XML metadata for each document.
Just wanted to check on this issue, with Ed Markey moving from the house to the senate today. It would be nice for the data to reflect that in his commitee memberships. For my needs, I don't care about past data so much as current changes, and maybe the 112th congress. It'd be nice to get at least that much.
We update current committee assignments using the committee_membership.py script. I've just run it, see f15f12d.
There's a wealth of historical committee membership data here: http://web.mit.edu/17.251/www/data_page.html#2%29
Pros:
Cons:
Given that the current-committees data has higher granularity (sub committees), is it worth scraping and preserving this data for historical committee membership?
Have to be a little careful with that data. Some is listed as for academic use only.
If anybody wants to brute-force this, Robert Byrd compiled one of the more comprehensive listings of old committees (and their chairpeople, but not members) that I've seen. The Senate historian seems to be keeping the list up to date.
Full membership information is available in the congressional directory, which has been published continuously since 1820. Scanned copies should be available from archive.org. Good luck getting that data into a structured format though...
Charles Stewart's data from the 1st-79th congresses does not have the academic-only disclaimer, but he does request a citation. (If you want his data served to you on a dead tree, you can apparently also buy the thing as a 4,000 page printed volume). I'm pretty sure that CQ also has a fairly comprehensive database of this information, locked away somewhere.
Oh, and the Wikipedians have compiled a good listing of resources for researching historical committee information....
If a link in our README would suffice as a citation, I don't have a problem with that.
On Tue, Aug 27, 2013 at 10:49 AM, schmod notifications@github.com wrote:
If anybody wants to brute-force this, Robert Byrd compiledhttp://books.google.com/books?id=PeHByMYxVm8C&printsec=frontcover&dq=isbn:0160632560&hl=en&sa=X&ei=xq4cUu_EEqi9sASwz4GQDA&ved=0CC8Q6AEwAA#v=onepage&q&f=falseone of the more comprehensive listings of old committees (and their chairpeople, but not members) that I've seen. The Senate historian seems to be keeping the list up to datehttp://www.senate.gov/artandhistory/history/resources/pdf/CommitteeChairs.pdf .
Full membership information is available in the congressional directory, which has been published continuously since 1820. Scanned copies should be available from archive.org. Good luck getting that data into a structured format though...
Charles Stewart's data from the 1st-79th congresses does not have the academic-only disclaimer, but he does request a citation. (If you want his data served to you on a dead tree, you can apparently also buy the thing as a 4,000 page printed volumehttp://books.google.com/books?id=J4JPMQEACAAJ&dq=isbn:1568021712&hl=en&sa=X&ei=UrAcUpCdI_Si4AP7u4B4&ved=0CDgQ6AEwAg). I'm pretty sure that CQ also has a fairly comprehensive database of this information, locked away somewhere.
— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congress-legislators/issues/46#issuecomment-23342377 .
Developer | sunlightfoundation.com
I'm w/ @konklone on README citation. I've also had zero luck getting CQ data in the past, they hold onto it pretty tight.
Maybe somebody should send Charles Stewart an email as a courtesy?
Agreed. And we should invite him to join Github and help us out!
I can take a stab at revisiting this. Senate calendar from FDSys still seem like a good place to start? @schmod, where is the XML metadata that has the content of the committee memberships that you referenced a few months ago? Can't locate it just poking around.
Also getting lots of dead links for commands like this:
fdsys --year=2009 --store=text,xml --collections=CCAL
e.g.
Downloading: data/fdsys/CCAL/2009/CCAL-111scal-2009-10-30/document.xml
file not found: http://www.gpo.gov/fdsys/pkg/CCAL-111scal-2009-10-30/xml/CCAL-111scal-2009-10-30.xml
Most of GPO site seems to be active. Any ideas?
I believe GPO's FDSys is only open for certain high priority collections: https://twitter.com/USGPO/status/384993220536455168
Just popping in to say I'm finding this thread helpful in our latest Cong. research, thanks all.
If anyone has a lead on historical member data with subcommittee affiliations, it would be of interest to us, but parent committees are a good start.
Also, I see this has been a recent request again, in issue #522 - maybe this is an area of wider interest for re-use.
The Congressional Directory was mentioned earlier, but I was looking it over so I thought I'd post more information:
I started writing some code before deciding parsing the plain text would be too hard to get done any time soon, but here's some code to pull down the text files:
import json
import urllib.request
def walk_directory(url):
print(url + "...")
directory = json.loads(urllib.request.urlopen(url + "?fetchChildrenOnly=1").read().decode("utf8"))
for node in directory["childNodes"]:
if node["nodeValue"]["level"] == 3 and node["nodeValue"].get("displayValue", "") != "Committee Assignments":
# Skip nodes that don't have committee assignments within them.
pass
elif "value" in node["nodeValue"]:
# Recursively go into this node.
walk_directory(url + "/" + node["nodeValue"]["value"])
elif re.match("ASSIGNMENTS OF (SENATORS|REPRESENTATIVES) TO COMMITTEES", node["nodeValue"].get("title", "")):
# This holds committee assignments!
parse_committee_assignments(node["nodeValue"]["packageid"], node["nodeValue"]["textfile"])
walk_directory("https://www.govinfo.gov/wssearch/rb/cdir")
Great work here, such an excellent source. I was curious about the possibility of keeping historical committee assignments for legislators from their previous terms. As I understand only current committee assignments are housed here. Anyone have thoughts on this?