holoviz / spatialpandas

Pandas extension arrays for spatial/geometric operations
BSD 2-Clause "Simplified" License
308 stars 25 forks source link

Ensure that pandas dtype matches dask when loading data from parquet #156

Open hoxbro opened 3 weeks ago

hoxbro commented 3 weeks ago

It makes it so that the following will give the same types before it returned (string[pyarrow], object).

import dask
import spatialpandas.io as sio

dask.config.set({"dataframe.convert-string": True})

# http://s3.amazonaws.com/datashader-data/nyc_buildings.parq.zip
ddf = sio.read_parquet_dask("./data/nyc_buildings.parq")
ddf["type"].dtype, ddf["type"].compute().dtype

Together with https://github.com/holoviz/holoviews/pull/6362 should make it possible to run the NYC Buildings example.

codecov[bot] commented 3 weeks ago

Codecov Report

Attention: Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.

Project coverage is 77.72%. Comparing base (af931c9) to head (aec415c).

Files Patch % Lines
spatialpandas/io/parquet.py 75.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #156 +/- ## ========================================== + Coverage 77.47% 77.72% +0.24% ========================================== Files 50 50 Lines 4843 4870 +27 ========================================== + Hits 3752 3785 +33 + Misses 1091 1085 -6 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.