unitedstates / congress

Public domain data collectors for the work of Congress, including legislation, amendments, and votes.
https://github.com/unitedstates/congress/wiki
Creative Commons Zero v1.0 Universal
929 stars 202 forks source link

Downloading House votes in 2001 and 1991 raises exception #275

Open Andrew-Chen-Wang opened 3 years ago

Andrew-Chen-Wang commented 3 years ago

Note: I ran the ./votes command for 2001 and 1991

Two house members for each date 2001 and 1991 have the same first, middle, and last name. This is the 2001 data point:

{'C000488': {'type': 'rep', 'start': '1999-01-06', 'end': '2001-01-03', 'state': 'MO', 'district': 1, 'party': 'Democrat'}, 'C001049': {'type': 'rep', 'start': '2001-01-03', 'end': '2003-01-03', 'state': 'MO', 'district': 1, 'party': 'Democrat'}}

Note that they start and end on the same date. This exception is raised when you run:

from .utils import lookup_legislator
from datetime import datetime

lookup_legislator(107, "rep", "Clay", "MO", "D", datetime(year=2001, month=1, day=3), "bioguide")

A solution to this is to check if the multiple matches have the same date for start for one member as the other member's end date. If so, then choose the member that has the latter date because we can compare the date string of the when with each member's start and end.

The only thing that worries me is this comment:

# This is a possible match. Remember which term matched, but because of term overlaps
# on Jan 3's, don't key on the term uniquely, only on the moc.

Does that mean a representative going out can vote on the same day one comes in?

JoshData commented 3 years ago

Does that mean a representative going out can vote on the same day one comes in?

Of course. In the general case, a member might resign after a vote on the same day another member elected by special election is sworn in. In the more specific Jan 3 case, there can be a vote in the morning of Jan 3 and a vote in the afternoon of Jan 3 and those would be in different Congresses with a (overlapping but) totally different set of legislators serving.

In this particular case, it's a father-son pair.

To help debugging, the issue you found can be reproduced by running one of:

./run votes --chamber=house --congress=107 --session=2001
./run votes --vote_id=h2-107.200

This was all working at some point because this is how I got the vote data into GovTrack in the first place, but something must have broken.

The way to properly resolve this is for us to compare the congress number of the vote to the congress numbers that the matched terms are for, but the latter needs to be computed (there is a function named get_term_congresses but I can't say if it is correct).

Andrew-Chen-Wang commented 3 years ago

Thanks for responding quickly. IIRC, from the congress-legislators repo, there was an XML file that included a tag <congress id="Congress number">. I just can't recall where I saw this or which link gets all congressmen data. Which link are we getting all the historical Congressmen from?

In that case, we can then update the files with that new data point, congress

JoshData commented 3 years ago

It reads the YAML files at https://github.com/unitedstates/congress-legislators/. (I don't think the XML file you are describing comes from these repos.)

Andrew-Chen-Wang commented 3 years ago

This repo reads the files stored in that repo. But I was wondering which files that repository collects, not this one.

JoshData commented 3 years ago

We scrape several sources in that repository. I don't remember off hand what all of the URLs are. But you can scan through the scripts at https://github.com/unitedstates/congress-legislators/tree/main/scripts to see.

For this issue, we can also go an easier route to solve it and just hard code the right bioguide ID to use for each of these votes.