Closed JoshData closed 11 years ago
That's so obnoxious. You're right, it looks like House Rules amendments throw the numbering alignment off. I don't see an easy way of dealing with this at the scraper level - there's no other number on THOMAS or the Clerk that would bring it alignment, so you basically need to look holistically at the amendments on a particular bill after they're all in, and if there's a House Rules amendment up front, knock the numbers back by one. That's much easier at the Sunlight or GovTrack level than in the scraper itself.
I think the best thing to do here is for the scrapers to faithfully report the numbers they see on the Clerk and on THOMAS, and then to document the behavior for users of the data. But that's not awesome; definitely open to other ideas.
Or, maybe this is a bug, and they can fix it. Congress.gov actually has just a teensy bit more prose about the amendment code:
http://beta.congress.gov/amendment/113th-congress/house-amendment/60 http://beta.congress.gov/amendment/113th-congress/house-amendment/61
It's called a "House Amendment Code", and it says "House Tally Clerks use this code to manage amendment information." So, if the House Clerk's vote data is off from this number, it's likely the Clerk is misnumbering things.
I emailed Andrew. We'll see.
Sources say the underlying assumption is correct that the A___ numbering on THOMAS should match the amendment numbering by the House Clerk, this particular case notwithstanding. (So, no resolution yet.)
If that's just LOC sources, probably worth asking a Clerk source as well.
Okay here's what I've learned:
Eric, your diagnosis was right. The numbering got out of alignment because the House Rules amendment is not considered numbered (for purposes of House votes) when a bill is considered under a closed rule, wherein the numbering that appears in vote XML files is the numbering assigned to Member amendments by House Rules itself. But, this is not a bug. This is just how it works.
There are five+ ways House amendments are being numbered/identified:
1) In vote XML: For bills considered under a rule, the number recorded in vote XML is the number assigned to the amendment by House Rules in their report accompanying their resolution providing for the consideration of the bill. It is entered by floor staff. If you look at corresponding activities in the House floor bulk XML, you'll see descriptive text explaining the number like "numbered X in the House Report XXX-YYY." (By the way, amendments here have statuses like submitted, withdrawn, and "made in order." I don't know what "made in order" means yet.) Amendments that are not made in order by House Rules will not get one of these numbers.
2) In vote XML: If the bill is instead considered under an "open rule," the numbering in vote XML is assigned by House floor staff according to the order in which the amendments are printed in the Congressional Record. In this case, amendments are pre-printed in the CR possibly over the course of several days. (The numbering is also consecutive starting with 1, separately numbered for different bills.)
2b) In vote XML, if the bill is not being considered under a rule, we are not sure what the numbering system is.
Whether the number in the vote XML is a (1)-style number or a (2)/(2a)-style number is not recorded, and I don't think we have any data to clarify this (i.e. is bill being considered under a closed rule).
3) The "A_" (e.g. A001) number, which appears on THOMAS. This is a consecutive integer just like the previous two. This numbering follows the order in which amendments are offered on the floor. The numbering is assigned in real time on the floor. This number appears to be used for all amendments, whether "made in order" by House Rules or not. An amendment in a House Rules report will get an A number but not a House Rules number.
4) In various places, including vote XML [update: this was wrong], by the name of the Member offering the amendment, i.e. "the Holt (NJ) amendment." (As you'd expect, that string is idiosyncratic by Member, including a state only if it is ambiguous otherwise.) If a Member offers multiple amendments, it would look something roughly like "the Holt (NJ) amendment # 1" and so on. In the floor summary bulk XML, this is the way amendments are referred to. The integer portion of this number follows the convention of (1)/(2). It might be possible to scrape this from THOMAS.
5) On THOMAS, the H.Amd number, which is assigned by Library staff (LIS staff?). These numbers are consecutive through a Congress. (Update: I don't know who assigns S.Amdt, and S.Up.Amdt numbering.)
In the particular instance that started this thread, we were comparing numbering types (1) and )3). The House Rules's substitute amendment was numbered A001 because it was essentially offered first. It was "adopted" pursuant to the rule itself. But it was not an amendment made in order by the House Rules committee and thus House Rules did not number it. The first such number went to the next amendment.
THOMAS lists A___ and H.Amdt numbering together, so there's a mapping between these two. It might be possible to infer/scrape the "Member Name Amendment"-style number type from THOMAS as well. That may be the only way to get over to the House vote XML. For purposes of cross-walks, the amendment-num field in vote XML should really be ignored because there's no way to know what it means. [update: this does not actually help]
Correction: House vote XML does not list (4)-style numbering. I.e. if a Member offers two amendments, both of which get a vote, they will not be numbered 1 and 2 in the House vote XML. They will appear with (1)/(2)-style numbering. See the top of this page: http://clerk.house.gov/evs/2012/ROLL_300.asp "Connolly of Virginia Amendment No. 8" and "Connolly of Virginia Amendment No. 13"
So, there is no possible cross-walk from THOMAS to House vote XML as far as I know, and as far as I have inferred about what records exist internally.
Actually the (1)/(2)-style number is mostly (but not always) scrapable from the amendment's purpose. See commit referenced above.
Where it's not scrapable, we just don't know the number and are missing the crosswalk.
I think that's enough to close this issue. What do you all think?
Sorry for all the emails.
Man: thank you for doing all this research and summarizing it.
Do you have a sense for how often the (1)/(2)-style number (what is now house_number
) isn't scrapable from the purpose? If not, I'll re-load them and check in my system, since I've been doing all this analytics...
I casually reviewed the changes before committing and my impression was around 70% coverage (that's out of amendments that should have a house_number, i.e. excluding the House Rules substitute amendments). But I didn't count. The Camp amendment that started this all off is one that doesn't have the number in the purpose.
I'm sorry to be of no use on this one. Is there value in having a crosswalk? If so, maybe this is an instance to do text comparison to see if they're identical?
Daniel
On Saturday, May 25, 2013, Joshua Tauberer wrote:
I casually reviewed the changes before committing and my impression was around 70% coverage (that's out of amendments that should have a house_number, i.e. excluding the House Rules substitute amendments). But I didn't count. The Camp amendment that started this all off is one that doesn't have the number in the purpose.
— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congress/issues/68#issuecomment-18446045 .
Daniel
Daniel Schuman Director | Advisory Committee on Transparencyhttp://transparencycaucus.org/ Policy Counsel | The Sunlight Foundation http://sunlightfoundation.com/ o: 202-742-1520 x 273 | c: 202-713-5795 | @danielschuman
There isn't really any text to compare. Unless we're back to complicated parsing of the CR. But that's way beyond what I want to get into.
I must have misunderstood. I thought amendments were showing up in both vote XML and Thomas, but upon reflection I imagine it's only the results of the vote that's showing up in vote XML and not the actual text b
On Saturday, May 25, 2013, Joshua Tauberer wrote:
There isn't really any text to compare. Unless we're back to complicated parsing of the CR. But that's way beyond what I want to get into.
— Reply to this email directly or view it on GitHubhttps://github.com/unitedstates/congress/issues/68#issuecomment-18447709 .
Daniel
Daniel Schuman Director | Advisory Committee on Transparencyhttp://transparencycaucus.org/ Policy Counsel | The Sunlight Foundation http://sunlightfoundation.com/ o: 202-742-1520 x 273 | c: 202-713-5795 | @danielschuman
Correct. (See the first link at the top of the thread.)
Dead on, @tauberer - after loading in amendments for the 111th Congress to the present day, 69.27% of the amendments with a house_number
have an offered_order
.
What strategy did you put into practice on GovTrack for displaying related amendments on votes?
Wow, my predictions can only be downhill from here! (I think you mean that proportion backwards though? Every HAmdt should have an offered_order. Also note that not all amendments should have a house_number. The House Rules amendment doesn't get a number.)
On GovTrack, I load amendments first and remember both the HAmdt number and the house_number. When loading House votes on amendments (category=amendment) I look up by the house_number, and if possible rewrite the vote title using the amendment's number and purpose. Nothing too interesting.
Oh, you're right - I actually wasn't running that query on fully fresh data, I had leftover fields from old code. But it still holds up: 68.27% of House amendments are able to extract a house_number
from the amendment description or purpose. (100% of them have an offered_order
.)
A little under half of the extracted house_number
's don't agree with the offered_order, and many are wildly different.
This one is baffling: H.Amdt. 82 of the 112th Congress has an hamdt number of 82, an offered_order
of 72 ("A072"), and a correctly extracted house_number
of 413.
Here's a better example, because it has a recorded vote. hamdt131-112 has an hamdt number of 131, an offered_order
of 121, and a house_number
of 498.
It was voted on in roll #119. That vote's <amendment-num>
field is 121, and the 498 can be found in the <amendment-author>
field of "Johnson of Ohio Amendment No. 498".
Note that this example is different from the original one you brought up - in that example, the <amendment-num>
field in the roll XML did not match the offered_order
("A___") number -- in this case, the vote's <amendment-num>
field matches the offered_order
exactly, but the extracted house_number
can only be matched to the <amendment-author>
field.
Three possibilities: House floor staff began recording amendments differently starting this year, or I jumped the gun and typically amendment-num matches the offered_order but some particular data made it look otherwise, or amendment-num is inconsistent. Ugh.
I don't see any great way of doing this short of a post-loading reconciliation step, that takes into account all the votes for a congress and all the amendments for a congress.
I guess the next step is for me to do that reconciliation and see the best strategies at matching this stuff up...
I came in at the end of this, and so am not clear on what the use case actually is. Is the problem simply matching THOMAS numbers with those in the XML, for correlation-linkage approaches?
For starters, I realize this won't be a popular approach, but I'm in favor of recording all of the various datapoints relevant to the amendment and then constructing crosswalks after. That will offend everyone's sense of order, no doubt, but...
a) while the system that is in place is crazy, and highly dependent on manual adjustments needing fairly detailed knowledge of process, it is the system that is in place, and it's unlikely to change because computer programmers think it is disorderly.
b) I recognize the temptation to try to rationalize all of this and discover underlying schemes and systems, because it's always nice to be able to parse identifiers and make calculations from them that generate other identifiers or clues. But sometimes you just can't. Section numbers in the US Code are a good example of this; our APIs are built on top of a database that is rebuilt every time we ingest a new version of a Title. There's just no other way to do it.
c) Why not just record all of it? I mean, it's not like we're using magnetic drums any more, or hand-magnetizing paper clips.
I think that's sort of where Eric ended up, but every so often I have to run out the front door and do my "get off my lawn" old-guy act.
For what it's worth, I think the best-practice reform I'd like to see on this is this: issuers of numbers under any system should also assign a completely opaque unique identifier to the object in question, and all downstream handlers of the object should preserve it. The alternative would be, as Daniel suggested, to do some kind of text-matching to construct a crosswalk. It might not be all that hard given that you could probably restrict the number of candidates to be matched fairly cleanly given that all participate in some sequential numbering system.
Apologies for leaping into this -- I really should be following more closely.
t
Recording is not the issue. We're trying to understand what the field means, and if it has a meaning, to use it. If it turns out not to have a meaning, so be it. The use case is matching the vote result details with other metadata on THOMAS.
The use case here is: when displaying a vote that was on an amendment, be able to link to that amendment and/or display information about it. It's a super basic, useful thing to do, that both @tauberer and I have been doing in our products for years. We're discovering some assumptions about the House, which numbers amendments relative to the bill rather than relative to the session of Congress, are flawed.
FWIW, it's easy to go from amendment to vote - the amendment page's actions link reliably to any roll call vote it was involved in. So the mapping can be easily done, after a day's lead time or so. But the House publishes votes within 20 minutes of it happening, and it's connecting those to any already-introduced amendments that we'd like to do.
There should be a way to work around it and deal with the House' process as it is, but it may not be possible to do at parse-time per-vote with 100% accuracy. More research to be done on this, at some point soon.
This one is also causing problems: http://clerk.house.gov/evs/2013/roll198.xml
Might be an open rule.
I've deleted all amendment info from GovTrack till I can figure this out....
It was an open rule on this bill.
I give up.
On GovTrack I'm ignoring this field and now just using THOMAS vote action lines to make the association (for House amendments).
Closing the issue!
Here's another interesting case for fun. This is one considered under a rule:
http://clerk.house.gov/evs/2013/roll222.xml amendment-num: 2 amendment-author: Blumenauer of Oregon Part B Amendment No. 2
If "Part B" scopes the amendment number, then the amendment-num field is again even less understandable.
It's not on THOMAS yet.
The amendment for House Roll 222 is now on THOMAS: http://hdl.loc.gov/loc.uscongress/legislation.113hamdt142
It's A002 on THOMAS and an amendment-num
of 2 on the Clerk's side. The amendment's purpose states:
An amendment numbered 2 printed in Part B of House Report 113-108
So that's what the "part B" refers to. But you know, I'm starting to hate the house_number
field and prefer the offered_order
(what we used to use). It seems to be right way more often - look at hamdt131-112 for an "amendment numbered 498 printed in the Congressional Record", with a THOMAS-assigned number of A121. In that one, its associated roll call vote correctly lists an amendment-num
of 121.
I want to return to the good old days. I suggest we do that, using the THOMAS-assigned number (what's now the offered_order
) and just accept some errors.
Yeah. Well, I wouldn't recommend anyone use either field in production, but I'd be down with renaming offered_order back to house_number and scrubbing the current house_number (since it's still being stored in the purpose field, and we can always extract it later).
All right, let's start with that then.
(I'll do that now.)
Closed in 12612f9.
Thanks!
According to the House, Amendment 1 to HR 807 is a Camp amendment. According to the LOC it is a House Rules amendment and the Camp amendment is amendment A002:
http://clerk.house.gov/evs/2013/roll140.xml http://thomas.loc.gov/cgi-bin/bdquery/z?d113:HR00807: http://www.govtrack.us/congress/votes/113-2013/h140
Maybe I was wrong to assume the A___ numbering corresponded to the numbering in House votes.