jtleider / censusdata

Download data from Census API
MIT License
139 stars 29 forks source link

issues pulling zip code #23

Closed briang-rdm closed 3 years ago

briang-rdm commented 3 years ago

Having issues pulling data for zip code tabulation area. I want to get data for all zip codes in the U.S. I am sure its simple, but can't seem to figure it out. thanks

import censusdata as acs acs.download('acs5', 2019, acs.censusgeo([('zip code tabulation area','*')]), ['B25007_014E','B25007_012E'],key=key) KeyError: 'state> zip code tabulation area'

I have also tried the below and receive the same error

acs.download('acs5', 2019, acs.censusgeo([('state','*'),('zip code tabulation area','*')]), ['B25007_014E','B25007_012E'],key=key)

KeyError: 'state> zip code tabulation area'

However, this code works. I recive a dictionary of all census zips (33,120)

acs.geographies(acs.censusgeo([('zip code tabulation area', '*')]), 'acs5', 2019)

adunmore commented 3 years ago

It looks like this results from an inconsistency in how the data.census.gov API and the library represent this geography type

represents this geography type (as state> zip code tabulation area) and how it's represented in the library (state> zip code tabulation area (or part))

  1. when a user queries data at this summary level, the census api includes metadata for the state and zip code of each row. And it refers to the zip code column as `zip code tabulation area. Here's a sample of the returned data for this query:
    [["NAME","B25007_014E","B25007_012E","state","zip code tabulation area"],
    ["ZCTA5 25245","0","28","54","25245"],
    ["ZCTA5 25268","0","46","54","25268"],
    ...]
  2. The library generates a censusgeo object for each row (used as index in the final dataframe). Each object takes the hierarchy state> zip code tabulation area.
  3. However, this summary level doesn't exist in censusdata.censusgeo.sumleveldict. Rather, the state>zip code level is represented as state> zip code tabulation area (or part). A keyerror occurs when the library fails to find the summary level in sumleveldict

@jtleider why does this summary level name exist? Would it be possible to rename level 871 from state> zip code tabulation area (or part) to state> zip code tabulation area to make it consistent with the data.census.gov representation?

Alternatively, we could add a second entry to the dict to support both representations (if there's a backwards-compatibility concern):

...
"state> zip code tabulation area (or part)": "871",
"state> zip code tabulation area": "871",
...
miaojingang commented 3 years ago

I ran into the same problem: KeyError: 'state> zip code tabulation area'.

censusdata.download(
        src="acs5", 
        year=2019,
        geo=censusdata.censusgeo([("zip code tabulation area", "*")]),
        var=["GEO_ID"],
        tabletype="profile"
    )

It works if I used year=2018.

miaojingang commented 3 years ago

One temporary workaround is to cast to numpy and then back to pd.DataFrame.

jtleider commented 3 years ago

It seems there are inconsistencies across years in the summary levels. As a workaround, I have modified the code so if the summary level is unknown it will be shown as such rather than raising an error. Thank you for bringing this issue to my attention.