holoviz / spatialpandas

Pandas extension arrays for spatial/geometric operations
BSD 2-Clause "Simplified" License
308 stars 24 forks source link

read_parquet_dask load_divisions with bounds #63

Open brl0 opened 3 years ago

brl0 commented 3 years ago

While using read_parquet_dask to read files written with pack_partitions_to_parquet method, passing bounds and load_divisions=True causes a KeyError. Reading the same file with one option or the other works.

Example:

from spatialpandas.io import read_parquet_dask
sdf = read_parquet_dask(path, bounds=bounds, load_divisions=True)

Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: -1

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-22-56d4a7063dd6> in <module>
----> 1 sdf_div = read_parquet_dask(pre_path, bounds=bounds, load_divisions=True)
      2 sdf_div

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/spatialpandas/io/parquet.py in read_parquet_dask(path, columns, filesystem, load_divisions, geometry, bounds, categories)
    231         path, columns, filesystem,
    232         load_divisions=load_divisions, geometry=geometry, bounds=bounds,
--> 233         categories=categories
    234     )
    235 

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/spatialpandas/io/parquet.py in _perform_read_parquet_dask(paths, columns, filesystem, load_divisions, geometry, bounds, categories)
    371 
    372     if load_divisions:
--> 373         divisions = div_mins + [div_maxes[-1]]
    374         if divisions != sorted(divisions):
    375             raise ValueError(

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
    822 
    823         elif key_is_scalar:
--> 824             return self._get_value(key)
    825 
    826         if is_hashable(key):

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
    930 
    931         # Similar to Index.get_value, but we do not fall back to positional
--> 932         loc = self.index.get_loc(label)
    933         return self.index._get_values_for_loc(self, loc, label)
    934 

/home/conda/store/816032bda5816104d47e08949e4ec085fd6b9a98be07c2a55cf29c652743653e-datum/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: -1