rapidsai / cuspatial

CUDA-accelerated GIS and spatiotemporal algorithms
https://docs.rapids.ai/api/cuspatial/stable/
Apache License 2.0
600 stars 151 forks source link

API to extract a subset of polygons from existing set #507

Closed harrism closed 1 year ago

harrism commented 2 years ago

It would be great if we can have a Python API like subset_polygon_geodataframe(geo_df,subset_ids or subset_range) to return the same four arrays like read_polygon_shapefile(shp_file). This will leave the C++ API untouched and could be more efficient than saving subset of polygons to shapefile and then read the saved shapefile back (round-trip to disk), especially for large polygon datasets.

Originally posted by @zhangjianting in https://github.com/rapidsai/cuspatial/issues/492#issuecomment-1082543191

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

thomcom commented 2 years ago

Hi Mark! I never saw this issue until the inactive-90d flag.

You can do this now:

geo_series._column.polygons._column.base_children[0] returns read_polygon_shapefile(...)[0] geo_series._column.polygons._column.base_children[1].base_children[0] returns read_polygon_shapefile(...)[1] geo_series._column.polygons.x returns read_polygons_shapefile(...)[2].x geo_series._column.polygons.y returns read_polygons_shapefile(...)[2].x

While these conversions are somewhat cumbersome, this is because it would be better to refactor point_in_polygon in python, for example, to be points_series.sjoin(polygons_series) than to make the underlying GeoArrow buffers more accessible but keep the API surface what it is. Don't you agree?

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

thomcom commented 1 year ago

This was closed in https://github.com/rapidsai/cuspatial/pull/660.

Mark, you can access the underlying offset members of polygons with:

geoseries.polygons.geometry_offset # for the multipolygons offsets
geoseries.polygons.part_offset # for the polygon offsets
geoseries.polygons.ring_offset # for the rings offsets