Closed schmod closed 11 years ago
I have a similar dislike for how hodge podge it feels; it's confusing. I believe the reason for having multiple IDs is that different systems that mention committees (House.gov, Senate.gov, THOMAS.gov) don't all use the LIS ID. I think @tauberer knows more on this than I do.
I imagine there's a way to make this simpler, and maybe it's just by renaming thomas_id
to lis_id
or committee_id
?
Diving into this further, it does appear as though the House often does refer to committees without the first two characters of the LIS ID (so, WM00 instead of HSWM00).
The LIS code system has been in use since the 93rd Congress (1973), and everybody does now seem to those codes in one format or the other (the Senate usually uses the full LIS ID, but occasionally omits the numeric subcommittee identifier at the end; Thomas uses the numeric identifier by itself for subcommittees; the House sometimes uses just the two character abbreviation, occasionally adding the numbers)
Given that the LIS ID seems to be the most "harmonized" of all of these (ie. you can very easily derive any of the others from it), it seems to be the most logical one for us to provide.
Unless there are inconsistencies...
The inconsistencies question is the real issue. There used to be inconsistencies, especially for the joint committees.
So what are the codes that the House uses now?
Picking the IDs apart, the schema as far as I can deduce is:
This schema does not appear to be terribly strict. The Joint Select Committee on Deficit Reduction uses a 'S' instead of a 'L.'
2-Character alphanumeric abbreviation. I'm told that these are arbitrarily assigned. Usually alphabetic -- the Senate Year 2000 Technology Problem (sp2k00) is one exception.
2-character numeric identifier. I'm also told that these are arbitrarily assigned, and to not infer anything from them.
The full committee is always referred to as 00, with one insane exception: The 'House Select Subcommittee on the United States Role in Iranian Arms Transfer to Croatia and Bosnia' was never actually associated with a full committee, and the Library treats it like a full committee.
Lately, there seems to be a trend to establish new subcommittees/assign new IDs, rather than rename existing subcommittees. (Senate Agriculture and HSGAC did this a bunch of times recently -- there may be a valid legislative rationale for this, but I couldn't figure out what it was...)
This is a great breakdown, thank you. Does this break down the LIS IDs only? How do House and Senate IDs differ?
As far as I can tell, the House, Senate, and Thomas all use the LIS ID, or some truncated version thereof.
The Senate seems to like using 'SPAG' or 'SPAG00' to refer to Agriculture, and always seems to use the full 6-character LIS ID for subcommittees (ie. 'SPAG16').
The House seems to prefer 'AG,' 'AG00,' or 'AG16' (the two-character abbreviation seems pretty rare). House sources overwhelmingly seem to prefer the 4-character 'AG00' format.
I'm still trying to figure out if there are any inconsistencies. It appears as though the scraping scripts actually use house_committee_id
and thomas_id
interchangeably, although the list of IDs and committees seems to originate from NYT, rather than any direct sources on House.gov.
Which House sources?
If we can confirm that the IDs the House is currently using match up perfectly, then I'm OK with dropping house_committee_id and senate_committee_id, and renaming thomas_id to just id.
It'd be real nice to be able to do that; I admit to not knowing the House sources well enough to answer, without re-doing all the research @schmod is kindly doing.
Closing due to inactivity. :)
Given that the information is now readily available, could we discuss the prospect of referring to committees via their LIS IDs instead of the current hodgepodge of Thomas IDs and partial LIS IDs?
At least on the Senate side, LIS IDs seem to be the de-facto standard for referring to committees, and this repository's way of providing committees (and especially subcommittees) with unique IDs seems unnecessarily convoluted.
Right now, standard committees look like
In our list (as far as I can tell) the
thomas_id
always matches thesenate_committee_id
, which (I think) is supposed to be the LIS ID's prefix. (However, LIS always refers full committees as XXXX00). On the house side,house_committee_id
always seems to match the second two letters of thethomas_id
, which again, is the prefix of the LIS ID.Similarly, for subcommittees, the
thomas_id
always seems to be equal to the LIS ID's numeric suffix. (In the above examples, Energy & Water's LIS ID is SSAP22, while Conservation, Energy, and Forestry is HSAG15).I'm not sure if there are any inconsistencies in here (thus justifying the duplicative specification of IDs), but it sure would seem easiest to just specify each committee and subcommittee by its LIS ID.
Thoughts?