unitedstates / congress-legislators

Members of the United States Congress, 1789-Present, in YAML/JSON/CSV, as well as committees, presidents, and vice presidents.
Creative Commons Zero v1.0 Universal
2.03k stars 502 forks source link

congress-legislators

Members of the United States Congress (1789-Present), congressional committees (1973-Present), committee membership (current only), and presidents and vice presidents of the United States in YAML, JSON, and CSV format.

Build Status

Overview

This project provides the following data files:

File Download Description
legislators-current YAML JSON CSV Currently serving Members of Congress.
legislators-historical YAML JSON CSV Historical Members of Congress (i.e. all Members of Congress except those in the current file).
legislators-social-media YAML JSON Current social media accounts for Members of Congress. Official accounts only (no campaign or personal accounts).
committees-current YAML JSON Current committees of the Congress, with subcommittees.
committee-membership-current YAML JSON Current committee/subcommittee assignments.
committees-historical YAML JSON Current and historical committees of the Congress, with subcommittees, from the 93rd Congress (1973) and on.
legislators-district-offices YAML JSON CSV District offices for current Members of Congress.
executive YAML JSON Presidents and vice presidents.

The data formats are documented below.

The files are maintained in YAML format in the main branch of this project. YAML is a serialization format similar in structure to JSON but typically written with one field per line. Like JSON, it allows for nested structure. Each level of nesting is indicated by indentation or a dash. CSV and JSON formatted files are also provided in the gh-pages branch --- they're linked above.

This database is maintained through a combination of manual edits by volunteers (from GovTrack, ProPublica, MapLight, FiveThirtyEight, and others) and automated imports from a variety of sources including:

Data Format Documentation

Legislators file structure overview

legislators-current.yaml and legislators-historical.yaml contain biographical information on all Members of Congress that have ever served in Congress, that is, since 1789, as well as cross-walks into other databases.

Each legislator record is grouped into four guaranteed parts: id's which relate the record to other databases, name information (first, last, etc.), biographical information (birthday, gender), and terms served in Congress. A typical record looks something like this:

- id:
    bioguide: R000570
    thomas: '01560'
    govtrack: 400351
    opensecrets: N00004357
    votesmart: 26344
    fec:
      - H8WI01024
    cspan: 57970
    wikipedia: Paul Ryan
    ballotpedia: Paul Ryan
    maplight: 445
    house_history: 20785
    icpsr: 29939
  name:
    first: Paul
    middle: D.
    last: Ryan
  bio:
    birthday: '1970-01-29'
    gender: M
  terms:
  ...
  - type: rep
    start: '2011-01-03'
    end: '2013-01-03'
  ...
  - type: rep
    start: '2013-01-03'
    end: '2015-01-03'
    state: WI
    party: Republican
    district: 1
    url: http://paulryan.house.gov
    address: 1233 Longworth HOB; Washington DC 20515-4901
    phone: 202-225-3031
    fax: 202-225-3393
    contact_form: http://www.house.gov/ryan/email.htm
    office: 1233 Longworth House Office Building

Terms correspond to elections and are listed in chronological order. If a legislator is currently serving, the current term information will always be the last one. To check if a legislator is currently serving, check that the end date on the last term is in the future.

The split between legislators-current.yaml and legislators-historical.yaml is somewhat arbitrary because these files may not be updated immediately when a legislator leaves office. If it matters to you, just load both files.

A separate file legislators-social-media.yaml stores social media account information. Its structure is similar but includes different fields.

Data Dictionary

The following fields are available in legislators-current.yaml and legislators-historical.yaml:

Leadership roles:

leadership_roles:
  - title: Minority Leader
    chamber: senate
    start: '2007-01-04'
    end: '2009-01-06'

For members with top formal positions of leadership in each party in each chamber, a leadership_roles field will include an array of start/end dates and titles documenting when they held this role.

Leadership terms are not identical to legislative terms, and so start and end dates will be different than legislative term dates. However, leaders do need to be re-elected each legislative term, so their leadership terms should all be subsets of their legislative terms.

Except where noted, fields are omitted when their value is empty or unknown. Any field may be unknown.

Notes: In most cases, a legislator has a single term on any given date. In some cases a legislator resigned from one chamber and was sworn in in the other chamber on the same day. Terms for senators list each six-year term, so the terms span three Congresses. For representatives and delegates, each two-year term is listed, each corresponding to a single Congress. But Puerto Rico's Resident Commissioner serves four-year terms, and so the Resident Commissioner will have a single term covering two Congresses (this has not been updated in historical data).

Historically, some states sending at-large representatives actually sent multiple at-large representatives. Thus, state and district may not be a unique key.

Data on Official Social Media Accounts

This dataset is designed to include accounts that are paid for with public funds and which represent official communications of their office. We rely on reasonable verification from the legislative office about the status of their accounts.

Offices are supposed to maintain strict separation of official funds and campaign funds, and official funds are not supposed to be used to further things like re-election efforts.

In practice, a campaign account may often look similar to an official account in terms of content, especially when expressing views on issues and legislations. However, there will be differences in what's appropriate for each account, and they will likely be maintained by different staff employed by different organizations.

The social media file legislators-social-media.yaml stores current social media account information.

Each record has two sections: id and social. The id section identifies the legislator using bioguide, thomas, and govtrack IDs (where available). The social section has social media account identifiers:

Several legislators do not have an assigned YouTube username. In these cases, only the youtube_id field is populated.

All values can be turned into URLs by preceding them with the domain name of the service in question (and in the case of YouTube channels, the path /channel):

Legislators are only present when they have one or more social media accounts known. Fields are omitted when the account is unknown.

Updating social media accounts

Available tasks with scripts/social_media.py:

Options used with the above tasks:

Committees Data Dictionary

The committees-current.yaml file lists all current House, Senate, and Joint committees of the United States Congress. It includes metadata and cross-walks into other databases of committee information. It is based on data scraped from House.gov and Senate.gov.

The committees-historical.yaml file is a possibly partial list of current and historical committees and subcommittees referred to in the unitedstates/congress project bill data, as scraped from THOMAS.gov. Only committees/subcommmittees that have had bills referred to them are included.

The basic structure of a committee entry looks like the following:

- type: house
  name: House Committee on Agriculture
  url: http://agriculture.house.gov/
  thomas_id: HSAG
  house_committee_id: AG
  jurisdiction: The U.S. House Committee on Agriculture, or Agriculture Committee,
    is a standing committee of the ...
  jurisdiction_source: http://en.wikipedia.org/wiki/House_Committee_on_Agriculture
  subcommittees:
     (... subcommittee list ...)

The two files are structured each as a list of committees, each entry an associative array of key/value pairs of committee metadata.

The fields available in both files are as follows:

Additional fields are present on current committee entries (that is, in committees-current.yaml):

Two additional fields are present on committees and subcommmittees in the committees-historical.yaml file:

Committee Membership Data Dictionary

The committee-membership-current.yaml file contains current committee assignments, as of the date of the last update of this file. The file is structured as a mapping from committee IDs to a list of committee members. The basic structure looks like this:

HSAG:
- name: Frank D. Lucas
  party: majority
  rank: 1
  title: Chair
  bioguide: L000491
- name: Bob Goodlatte
  party: majority
  rank: 2
(...snip...)
HSAG03:
- name: Jean Schmidt
  party: majority
  rank: 1
  title: Chair

The committee IDs in this file are the thomas_id's from the committees-current.yaml file, or for subcommittees the concatentation of the thomas_id of the parent committee and the thomas_id of the subcommittee.

Each committee/subcommittee entry is a list containing the members of the committee. Each member has the following fields:

District Offices Data Dictionary

The legistlators-district-offices.yaml file lists district offices for all currently serving Members of Congress. This data is crowdsourced from members' official websites. It does not include Congressional offices in Washington, D.C.; these are listed in the legislators-current.yaml file.

Each current Member of Congress has a listing in the file, comprised of two parts: ids and offices.

The id section contains the fields bioguide, thomas, and govtrack, which correspond to fields with the same names in legislators-current.yaml as described above. The bioguide field is required, and used as the primary key for this file.

The offices section is a list of the Member's district offices. Each listing contains the following fields:

To qualify for inclusion in this file, an office must have at least an address or a phone number.

The Executive Branch

Because of their role in the legislative process, we also include a file executive.yaml which contains terms served by U.S. presidents (who signed legislation) and U.S. vice presidents (who are nominally the president of the Senate and occassionally cast tie-breaking votes there).

This file has a similar structure as the legislator files. The file contains a list, where each entry is a person. Each entry is a dict with id, name, bio, and terms fields.

The id, bio, and name fields are the same as those listed above. Except:

Each term has the following fields:

Presidents and vice presidents that previously served in Congress will also be listed in one of the legislator files, but their Congressional terms will only appear in the legislator files and their executive-branch terms will only appear in executive.yaml.

State Abbreviations

Although you can find the USPS abbreviations for the 50 states anywhere, non-voting delegates from territories --- including historical territories that no longer exist --- are included in this database. Here is a complete list of abbreviations:

The 50 States:

AK Alaska
AL Alabama
AR Arkansas
AZ Arizona
CA California
CO Colorado
CT Connecticut
DE Delaware
FL Florida
GA Georgia
HI Hawaii
IA Iowa
ID Idaho
IL Illinois
IN Indiana
KS Kansas
KY Kentucky
LA Louisiana
MA Massachusetts
MD Maryland
ME Maine
MI Michigan
MN Minnesota
MO Missouri
MS Mississippi
MT Montana
NC North Carolina
ND North Dakota
NE Nebraska
NH New Hampshire
NJ New Jersey
NM New Mexico
NV Nevada
NY New York
OH Ohio
OK Oklahoma
OR Oregon
PA Pennsylvania
RI Rhode Island
SC South Carolina
SD South Dakota
TN Tennessee
TX Texas
UT Utah
VA Virginia
VT Vermont
WA Washington
WI Wisconsin
WV West Virginia
WY Wyoming

Current Territories:

Legislators serving in the House from these territories are called delegates, except for the so-called "Resident Commissioner" from Puerto Rico.

AS American Samoa
DC District of Columbia
GU Guam
MP Northern Mariana Islands
PR Puerto Rico
VI Virgin Islands

Historical Territories:

These territories no longer exist.

DK Dakota Territory
OL Territory of Orleans
PI Philippines Territory/Commonwealth

Helping us maintain the data

You can just use the data directly without running any scripts. If you want to develop on and help maintain the data, our scripts are tested and developed on Python 3.6.

(Recommended) First, create a virtualenv in the scripts directory:

cd scripts
virtualenv virt
source virt/bin/activate

Install the requirements:

pip install -r requirements.txt

Try updating the House members contact information (mailing address, etc.):

python house_contacts.py

Check whether and how the data has changed:

git diff ../*.yaml

We run the following scripts periodically to scrape for new information and keep the data files up to date. The scripts do not take any command-line arguments.

The following script takes one required command line argument

The following script is run to create alternately formatted data files for the gh-pages branch. It takes no command-line arguments.

Two scripts help maintain and validate district office data:

Every script in scripts/ should be safely import-able without executing code, beyond imports themselves. We typically do this with a def run(): declaration after the imports, and putting this at the bottom of the script:

if __name__ == '__main__':
  run()

Every pull request will pass submitted scripts through an import, to catch exceptions, and through pyflakes, to catch unused imports or local vars.

To contribute updates for district offices, edit the legislators-district-offices.yaml file by hand and submit a pull request. Updates should pass validation as defined by scripts/office_validator.py.

Other Scripts

The ballotpedia field has been created using code from James Michael DuPont, using the code in git@github.com:h4ck3rm1k3/rootstrikers-wikipedia.git in the branch ballotpedia.

Related libraries

Who's Using This Data

Ongoing projects making use of this data:

Stories written with this data:

Other projects:

Public domain

This project is dedicated to the public domain. As spelled out in CONTRIBUTING:

The project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.