oturns / geosnap

The Geospatial Neighborhood Analysis Package
https://oturns.github.io/geosnap-guide
BSD 3-Clause "New" or "Revised" License
243 stars 32 forks source link

KeyError: 'wkb' #311

Closed suhanmappingideas closed 2 years ago

suhanmappingideas commented 2 years ago

I am still using geosnap 0.5.0 because I was never able to install 0.6.0 properly due to the version conflict of some of python packages. Now the code that used to work before does not work anymore. I added an example below. I guess that the error will be resolved if I install version 0.6.0, so I tried to install it in multiple different environments, Linux, Windows, Google Colab, and Binder, but I was never able to successfully install geosnap 0.6.0. Any help will be greatly appreciated.


from geosnap import datasets from geosnap import Community

from geosnap.io import store_ltdb

sample = "downloads/LTDB_Std_All_Sample.zip" full = "downloads/LTDB_Std_All_fullcount.zip" store_ltdb(sample=sample, fullcount=full)

this will create a new community using data from Washington DC (which is fips code 11)

dc = Community.from_census(state_fips="11") dc.gdf.head() -------------------------------------After I run the code above. I got an error below--------------

KeyError Traceback (most recent call last) c:\users\suzie.conda\envs\geosnap\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2645 try: -> 2646 return self._engine.get_loc(key) 2647 except KeyError:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'wkb'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)

in 1 # this will create a new community using data from Washington DC (which is fips code 11) ----> 2 dc = Community.from_census(state_fips="11") 3 dc.gdf.head() c:\users\suzie\.conda\envs\geosnap\lib\site-packages\geosnap\_community.py in from_census(cls, state_fips, county_fips, msa_fips, fips, boundary, years) 1668 1669 df_dict = { -> 1670 1990: datasets.tracts_1990(states=states), 1671 2000: datasets.tracts_2000(states=states), 1672 2010: datasets.tracts_2010(states=states), c:\users\suzie\.conda\envs\geosnap\lib\site-packages\geosnap\_data.py in tracts_1990(self, states, convert) 308 t["year"] = 1990 309 if convert: --> 310 return _convert_gdf(t) 311 else: 312 return t c:\users\suzie\.conda\envs\geosnap\lib\site-packages\geosnap\_data.py in _convert_gdf(df) 95 else: 96 with multiprocessing.Pool() as P: ---> 97 df["geometry"] = P.map(_deserialize_wkb, df["wkb"]) 98 df = df.drop(columns=["wkb"]) 99 c:\users\suzie\.conda\envs\geosnap\lib\site-packages\pandas\core\frame.py in __getitem__(self, key) 2798 if self.columns.nlevels > 1: 2799 return self._getitem_multilevel(key) -> 2800 indexer = self.columns.get_loc(key) 2801 if is_integer(indexer): 2802 indexer = [indexer] c:\users\suzie\.conda\envs\geosnap\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2646 return self._engine.get_loc(key) 2647 except KeyError: -> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance) 2650 if indexer.ndim > 1 or indexer.size > 1: pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'wkb'
knaaptime commented 2 years ago

this error happens because geosnap's census database uses a different storage convention and the old version of the software doesn't understand the new implementation. Instead of manually converting shapely objects to WKB representation and back, we use the built-in parquet functionality from geopandas.

So the error happens because you're using an old version of geosnap that expects a wkb column which no longer exists in the current data. The best way to get around this would be to install the current version of geosnap using anaconda. If you install using any method other than anaconda, there's no way to ensure you'll avoid version conflicts in the underlying C dependencies like GDAL. We test the package on all three platforms (and they're all passing), so you should be able to use the latest version fine as long as you stick with conda.

If for some reason you can't use anaconda and you're stuck on version 0.5.0, then you can still use all of geosnap's functionality, you just won't have access to the built-in datasets. The census data are just there for convenience because we use them so often, but the package operates just fine if you bring your own data (e.g. using something like from_geodataframes method)

closing as duplicate of #296

suhanmappingideas commented 2 years ago

Um.. I tried to install using conda install -c conda-forge geosnap but I was never able to install it succesfully. Is there anything else that I need to do? it just keeps doing like below

(geosnap_test) PS C:\Users\Suzie> conda install -c conda-forge geosnap Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: \

knaaptime commented 2 years ago

try creating a new empty environment then installing geosnap there

conda create -n geosnap
conda activate geosnap
conda install geosnap -c conda-forge

you will also probably have better luck with the mamba package manager

so you might want to do conda install mamba -y then swap the mamba command anywhere you would normally use conda (e.g. mamba install geosnap)

suhanmappingideas commented 2 years ago

I have no luck. I was just wondering how I can still use ltdb data while using geosnap 0.5.0. Disconnecting data service was frustrating because I made and have been using an application that brings ltdb data.

suhanmappingideas commented 2 years ago

The reason that I got stuck last time was that I was not able to install geosnap 0.6.0 using mamba nor conda. But this time I tried to install miniconda, and installed mamba and then I was able to install geosnap 0.6.0 using mamba on my Windows10. I was not even able to install geosnap 0.6.0 using conda in my Ubuntu. It looks like Ananconda got sick recently?!

Anyways, your mamba solution worked out great for me with miniconda. But somehow it looks like when I select LTDB dataset on geosnap 0.6.0, something invisible stuff in datasets is different. I just updated from geosnap 0.5.0 to geosnap 0.6.0 and the code that used to work with ltdb data in the previous version does not work anymore in the new version. Is there any changes on the datasets especially in relation to the geometry data or type possibly? Maybe is there any changes on data type? Are you aware of anything in terms of the issue? I still could not find what the difference is between the old dataset and the newt dataset, but somehow it gives me an error when I use the same new dataset and run clustering algorithms (I mean the content of dataset looks pretty much same but they are technically different in terms of data type, non-value or infinity.....but not sure what the exact difference is)

knaaptime commented 2 years ago

glad to hear you were able to get things working with the newest version. There's no change in the way the data are processed--the function is unchanged since the original implementation except that we use a different storage format. It's possible that missing values get treated differently somehow during read/write, but there's nothing different in our tests, or when I've used the data in my own work.

It sounds like any lingering issues are probably analysis-specific, and it's probably best to explore the LTDB data a bit until you can identify what's going on