walkerke / pygris

Use US Census shapefiles in Python (port of the R tigris package)
https://walker-data.com/pygris
MIT License
107 stars 16 forks source link

counties() - State Column Name Discrepancy #15

Open apsocarras opened 8 months ago

apsocarras commented 8 months ago

counties(), line 93, fails for 2010

  ctys = _load_tiger(url, cache = cache, subset_by = subset_by)

  if state is not None:
      if type(state) is not list:
          state = [state]
      valid_state = [validate_state(x) for x in state]
      ctys = ctys.query('STATEFP in @valid_state') #STATEFP not in 2010 shapefile columns

E.g.

from pygris import counties
import us 
import random 

state_list = [{"name":s.name, "fips":s.fips, 'usps':s.abbr} for s in us.states.STATES]
n = random.randint(0,50) 
rand_state = state_list[n] # e.g. {'name': 'Nebraska', 'fips': '31', 'usps': 'NE'}
counties(state=rand_state['fips'], year=2010)

UndefinedVariableError: name 'STATEFP' is not defined

Link to TIGER file used in above example: https://www2.census.gov/geo/tiger/TIGER2010/COUNTY/2010/tl_2010_us_county10.zip


from pygris.helpers import _load_tiger
gdf =_load_tiger("https://www2.census.gov/geo/tiger/TIGER2010/COUNTY/2010/tl_2010_us_county10.zip")

print(gdf.head())

STATEFP10 COUNTYFP10 COUNTYNS10 GEOID10 NAME10 \ 0 02 013 01419964 02013 Aleutians East
1 02 016 01419965 02016 Aleutians West

apsocarras commented 8 months ago

Made quick check of what alternative schemas there are (https://github.com/apsocarras/pygris/blob/issue-4/reprex/reprex.ipynb)

schema_str, count "AREA, PERIMETER, CO99D00, CO99_D00_I, STATE, COUNTY, NAME, LSAD, LSAD_TRANS, geometry", 3 "AREA, PERIMETER, CO99D90, CO99_D90_I, ST, CO, NAME, geometry", 3 "GEO_ID, STATE, COUNTY, NAME, LSAD, CENSUSAREA, geometry", 3 "STATEFP00, COUNTYFP00, CNTYIDFP00, NAME00, NAMELSAD00, LSAD00, CLASSFP00, MTFCC00, UR00, FUNCSTAT00, ALAND00, AWATER00, INTPTLAT00, INTPTLON00, geometry", 3 "STATEFP10, COUNTYFP10, COUNTYNS10, GEOID10, NAME10, NAMELSAD10, LSAD10, CLASSFP10, MTFCC10, CSAFP10, CBSAFP10, METDIVFP10, FUNCSTAT10, ALAND10, AWATER10, INTPTLAT10, INTPTLON10, geometry", 3

My proposal is to change the validation to check for any of the listed variants of the 'STATEFP' column (STATE, ST, STATEFP00, STATEFP10)