unitedstates / congress-legislators

Members of the United States Congress, 1789-Present, in YAML/JSON/CSV, as well as committees, presidents, and vice presidents.
Creative Commons Zero v1.0 Universal
2.06k stars 505 forks source link

Missing 57 full_name values in legislators-current.csv #749

Closed imjonathan closed 3 years ago

imjonathan commented 3 years ago

I haven't looked in detail, but this may be related to the new congress. Seems to be a missing data issue, not a csv output issue, as I see the lack of full_name in the csv corresponds to the lack of official_full in the json.

Here are the errors:

import pandas as pd
legislators_cur = pd.read_csv('https://theunitedstates.io/congress-legislators/legislators-current.csv')
missing_fullname = legislators_cur[legislators_cur['last_name'].notna() & legislators_cur['full_name'].isna()].copy()
# We are missing full names for 57 people with last names

My temp fix in case it is useful to others :

# I'll define full name as the concatenation of:
name_cols = ['first_name', 'middle_name', 'last_name', 'suffix']

# Make na an empty string
missing_fullname[name_cols] = missing_fullname[name_cols].fillna('')

# Concat the name cols (joined with a space
missing_fullname['full_name'] = missing_fullname[name_cols].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
# Clean up multiple spaces
missing_fullname['full_name'].replace('\s+', ' ', inplace=True, regex=True)

legislators_cur.loc[(legislators_cur['last_name'].notna() & legislators_cur['full_name'].isna()), 'full_name'] = missing_fullname['full_name']
imjonathan commented 3 years ago

Doh! Same issue in legislators-historical