Closed jonmmease closed 4 years ago
I re-ran your notebook with my geopandas branch that starts using pygeos, and then I get this picture:
Cool! Most of the overhead toward the left for spatialpandas is in reconstructing the array (rather than the actual R-tree+intersection calculation). Maybe using pyarrow
's take
will help some there. But this may be a case where the PyGEOS
data model will have an edge since it doesn't have to allocate the long contiguous buffers when indexing. Are GEOS shape objects immutable so that you can reuse them across arrays?
For Datashader's use I won't actually call cx
, instead I'll have it compute the indices of the intersecting shapes and iterate over those when rendering.
In 648c66a over in https://github.com/jonmmease/spatialpandas/pull/5 I converted to using pyarrow's take
method. It makes cx
a little slower than my custom code, but I think the standardization is worth it.
In 648c66a over in #5 I converted to using pyarrow's take method. It makes cx a little slower than my custom code, but I think the standardization is worth it.
Hmm, the take implementations for ListArrays could potentially be optimized in arrow, then. But at least it gives a nice code simplification!
But this may be a case where the PyGEOS data model will have an edge since it doesn't have to allocate the long contiguous buffers when indexing. Are GEOS shape objects immutable so that you can reuse them across arrays?
Yes, indeed, shapely GEOS objects are immutable, and the array itself are pointers to those objects. So "take" is basically only rearranging those pointers.
I also tried the STRtree of pygeos for this, it gives a small overhead for the bigger areas (around 10ms), and also achieves a nice speed-up for the small area (down to 1 ms). Should port this to geopandas!
This PR adds vectorized numba implementations of rectangle intersection for all geometry types.
intersect_bounds
method to each geometry array type that returns a boolean mask indicating which elements of the array intersect with the supplied bounds.cx
indexer to filter geometry arrays spatially. Unlike Geopandas, thecx
indexer uses an R-tree (TheHilbertRtree
added previously) to first lookup elements based on bounding box before executing the fine-grained intersection logic. Because of this, thecx
operation gets significantly faster as the selected region get smaller.See https://anaconda.org/jonmmease/cx_benchmarking_pr for benchmarking results comparing
cx
runtime on a dataset of ~30k polygons.