Describe the bug
My implementation of the named datatypes uses a poorly imagined "brackets" data structure to identify where the Multi features are located within the offsets arrays for LineString and Polygon coordinates. Instead of using brackets, to match the GeoArrow format, the offsets for LineStrings and Polygons should simply be 1, and the offset for Multi's should be the number of that feature that the Multi is made from in sequence.
Plan
As part of fixing the bug, I'll convert cuspatial to use pyarrow for all of the Shapely/GeoPandas serialization. Presently I am doing this serialization manually, which requires many lines of code and is hard for other developers to understand. Additionally, that means that all indexing, host/device copies, and deserialization are manual, too. Fixing this bug will have four steps:
[x] Drop manual input serialization of GeoSeries objects and use pyarrow native types for all serialization.
582
583
[x] Replace all the manual indexing into GeoArrow objects with simple pyarrow based indexing.
[x] Add proper getitem for UnionArray in pyarrow.
[x] Replace any manual device/host data movement with pyarrow native code.
[x] Create all Shapely objects and GeoSeries objects using pyarrow buffers.
[x] Refactor parent class of GeoColumn to be ListColumn instead of NumericalColumn
As each step begins I'll create an issue and a PR for it and document them here.
Describe the bug My implementation of the named datatypes uses a poorly imagined "brackets" data structure to identify where the Multi features are located within the offsets arrays for LineString and Polygon coordinates. Instead of using brackets, to match the GeoArrow format, the offsets for LineStrings and Polygons should simply be 1, and the offset for Multi's should be the number of that feature that the Multi is made from in sequence.
Steps/Code to reproduce bug
The issue is documented in precise detail here: https://notebooksharing.space/view/517f3172b12354804179f248247ab5ffd6573214e9f9810d13494533f1aefd8a#displayOptions=
Plan As part of fixing the bug, I'll convert
cuspatial
to usepyarrow
for all of the Shapely/GeoPandas serialization. Presently I am doing this serialization manually, which requires many lines of code and is hard for other developers to understand. Additionally, that means that all indexing, host/device copies, and deserialization are manual, too. Fixing this bug will have four steps:pyarrow
native types for all serialization.582
583
pyarrow
based indexing.getitem
forUnionArray
in pyarrow.pyarrow
native code.pyarrow
buffers.As each step begins I'll create an issue and a PR for it and document them here.