holoviz / spatialpandas

Pandas extension arrays for spatial/geometric operations
BSD 2-Clause "Simplified" License
308 stars 25 forks source link

Implement R-tree accelerated cx spatial indexer #4

Closed jonmmease closed 4 years ago

jonmmease commented 4 years ago

This PR adds vectorized numba implementations of rectangle intersection for all geometry types.

See https://anaconda.org/jonmmease/cx_benchmarking_pr for benchmarking results comparing cx runtime on a dataset of ~30k polygons.

newplot

jonmmease commented 4 years ago

I re-ran your notebook with my geopandas branch that starts using pygeos, and then I get this picture:

Cool! Most of the overhead toward the left for spatialpandas is in reconstructing the array (rather than the actual R-tree+intersection calculation). Maybe using pyarrow's take will help some there. But this may be a case where the PyGEOS data model will have an edge since it doesn't have to allocate the long contiguous buffers when indexing. Are GEOS shape objects immutable so that you can reuse them across arrays?

For Datashader's use I won't actually call cx, instead I'll have it compute the indices of the intersecting shapes and iterate over those when rendering.

jonmmease commented 4 years ago

In 648c66a over in https://github.com/jonmmease/spatialpandas/pull/5 I converted to using pyarrow's take method. It makes cx a little slower than my custom code, but I think the standardization is worth it.

newplot-2

jorisvandenbossche commented 4 years ago

In 648c66a over in #5 I converted to using pyarrow's take method. It makes cx a little slower than my custom code, but I think the standardization is worth it.

Hmm, the take implementations for ListArrays could potentially be optimized in arrow, then. But at least it gives a nice code simplification!

jorisvandenbossche commented 4 years ago

But this may be a case where the PyGEOS data model will have an edge since it doesn't have to allocate the long contiguous buffers when indexing. Are GEOS shape objects immutable so that you can reuse them across arrays?

Yes, indeed, shapely GEOS objects are immutable, and the array itself are pointers to those objects. So "take" is basically only rearranging those pointers.

I also tried the STRtree of pygeos for this, it gives a small overhead for the bigger areas (around 10ms), and also achieves a nice speed-up for the small area (down to 1 ms). Should port this to geopandas!