rapidsai / cuspatial

CUDA-accelerated GIS and spatiotemporal algorithms
https://docs.rapids.ai/api/cuspatial/stable/
Apache License 2.0
600 stars 151 forks source link

[BUG] cuSpatial does not correctly implement GeoArrow format for MultiLineString and MultiPolygon #575

Closed thomcom closed 2 years ago

thomcom commented 2 years ago

Describe the bug My implementation of the named datatypes uses a poorly imagined "brackets" data structure to identify where the Multi features are located within the offsets arrays for LineString and Polygon coordinates. Instead of using brackets, to match the GeoArrow format, the offsets for LineStrings and Polygons should simply be 1, and the offset for Multi's should be the number of that feature that the Multi is made from in sequence.

Steps/Code to reproduce bug

The issue is documented in precise detail here: https://notebooksharing.space/view/517f3172b12354804179f248247ab5ffd6573214e9f9810d13494533f1aefd8a#displayOptions=

Plan As part of fixing the bug, I'll convert cuspatial to use pyarrow for all of the Shapely/GeoPandas serialization. Presently I am doing this serialization manually, which requires many lines of code and is hard for other developers to understand. Additionally, that means that all indexing, host/device copies, and deserialization are manual, too. Fixing this bug will have four steps:

As each step begins I'll create an issue and a PR for it and document them here.

harrism commented 2 years ago

Thanks @thomcom . Can you target 22.08 for the fix?

thomcom commented 2 years ago

Yes!

thomcom commented 2 years ago

@harrism this is done in #585 for 22.08