jtleider / censusdata

Download data from Census API
MIT License
139 stars 29 forks source link

Extracting FIPS and name from censusgeo index? #39

Closed johnziebro closed 2 years ago

johnziebro commented 2 years ago

I am looking to do a join between the censusdata dataframe and a geodataframe containing associated geometry indexed by the FIPS code.

The following extracts the FIPS and NAME from the result's censusgeo index and reindexes by the FIPS. I've reviewed https://jtleider.github.io/censusdata/api.html#module-censusdata.censusgeo, but did not see an easier way to do this. Can you provide any insights on a simpler way than below, or am I missing how to use the censusgeo index altogether? Thank you.

import censusdata as cd

# query for census variables
df = cd.download('acs5', 2019, cd.censusgeo([('state', '*')]), ['C17002_001E', 'B17001_002E'])

# display censusdata result
display(df.head(2))

# convert index to FIPS and Name
df['STATEFP'] = [item.params()[0][1] for item in df.index.to_list()]
df['NAME'] = [str(item).split(':')[0] for item in df.index.to_list()]

# rearrange columns and set FIPS as index
df.set_index('STATEFP', inplace=True)
columns = df.columns.to_list()
columns.remove("NAME")
df = df[["NAME"] + columns]

# display reindexed df
df.head(2)
Result C17002_001E B17001_002E
Alabama: Summary level: 040, state:01 4754288 795989
Alaska: Summary level: 040, state:02 719376 76933
Reindexed to FIPS
STATEFP
NAME C17002_001E B17001_002E
01 Alabama 4754288 795989
02 Alaska 719376 76933
jtleider commented 2 years ago

To do exactly what you did, I don't think there is a much easier way. However, if you merge instead of join, you just need to create the FIPS column, which is one line of code. I don't see that the issue here is a lack of functionality in the censusdata package; it is just that several steps are required if you want to change the index to no longer use censusgeo objects. Please let me know though if you had other thoughts.

johnziebro commented 2 years ago

An example of your suggested method of merging instead of joining would be helpful. Other than that, thank you for the feedback and this ticket may be considered closed.

jtleider commented 2 years ago

Here you go:

import pandas as pd import censusdata as cd df = cd.download('acs5', 2019, cd.censusgeo([('state', '*')]), ['C17002_001E', 'B17001_002E']) df['STATEFP'] = [item.params()[0][1] for item in df.index.to_list()] df2 = pd.DataFrame({'STATEFP': ['01', '02', '04', '06'], 'test_data': [10, 11, 12, 13]}) pd.merge(df, df2, on='STATEFP')