holoviz / spatialpandas

Pandas extension arrays for spatial/geometric operations
BSD 2-Clause "Simplified" License
308 stars 25 forks source link

spatial join Dask updates #23

Closed jonmmease closed 4 years ago

jonmmease commented 4 years ago

This PR improves the efficiency of the spatial join algorithm for the case where the left frame is a DaskGeoDataFrame and the right frame is a GeoDataFrame.

Now, we iterate over the partitions of the left frame and use the partition bounding box to extract the candidate rows from the right frame (using the right frame's spatial index). If no rows are selected while performing an inner join, the partition is skipped altogether.

722ca87 also adds a fix to make sure that that pyarrow parquet parts are sorted properly when loading a parquet file.