oturns / geosnap

The Geospatial Neighborhood Analysis Package
https://oturns.github.io/geosnap-guide
BSD 3-Clause "New" or "Revised" License
244 stars 32 forks source link

error when pulling blocks for Puerto Rico MSA #291

Closed AnGWar26 closed 3 years ago

AnGWar26 commented 3 years ago

This error occurs when attempting to instantiate a community using the from_lodes constructor for any Puerto Rico MSA. This is due to geosnap not having the data for Puerto Rico.

This code:

blocks = Community.from_lodes(msa_fips = '10380', years = [2017])

Produces the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/harmonization/lib/python3.7/site-packages/geosnap/_data.py in blocks_2010(self, states, convert, fips)
    269             try:
--> 270                 blks[state] = pd.read_parquet(blocks_2010[f"{state}.parquet"].get_cached_path())
    271             except:

~/anaconda3/envs/harmonization/lib/python3.7/site-packages/quilt3/packages.py in __getitem__(self, logical_key)
    646         for key_fragment in self._split_key(logical_key):
--> 647             pkg = pkg._children[key_fragment]
    648         return pkg

KeyError: '72.parquet'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-30-3ccfc182068c> in <module>
----> 1 blocks = Community.from_lodes(msa_fips = '10380', years = [2017])

~/anaconda3/envs/harmonization/lib/python3.7/site-packages/geosnap/_community.py in from_lodes(cls, state_fips, county_fips, msa_fips, fips, boundary, years, dataset)
   1800                 ]
   1801 
-> 1802         gdf = datasets.blocks_2010(states=states, fips=(tuple(allfips)))
   1803         gdf = gdf.drop(columns=["year"])
   1804         gdf = _fips_filter(

~/anaconda3/envs/harmonization/lib/python3.7/site-packages/geosnap/_data.py in blocks_2010(self, states, convert, fips)
    270                 blks[state] = pd.read_parquet(blocks_2010[f"{state}.parquet"].get_cached_path())
    271             except:
--> 272                 blks[state] = blocks_2010[f"{state}.parquet"]()
    273             if fips:
    274                 blks[state] = blks[state][blks[state]["geoid"].str.startswith(fips)]

~/anaconda3/envs/harmonization/lib/python3.7/site-packages/quilt3/packages.py in __getitem__(self, logical_key)
    645         pkg = self
    646         for key_fragment in self._split_key(logical_key):
--> 647             pkg = pkg._children[key_fragment]
    648         return pkg
    649 

KeyError: '72.parquet'
knaaptime commented 3 years ago

we need to make it clear that we don't have PR geodata, so these queries will always fail. Can you provide the actual code you're using?

AnGWar26 commented 3 years ago

I've modified the issue with your requested changes.

I've also tried fetching Puerto Rico data from the census, like this:

blocks = Community.from_census(msa_fips = '10380')

and this just results in an empty GeoDataFrame:

image

knaaptime commented 3 years ago

right, we dont have PR data so these will always fail, but i'll add an informative warning