holoviz / spatialpandas

Pandas extension arrays for spatial/geometric operations
BSD 2-Clause "Simplified" License
308 stars 25 forks source link

ENH: Fixed width types #7

Closed jonmmease closed 4 years ago

jonmmease commented 4 years ago

So far, spatialpandas supports "ragged" geometry types where the representation of the geometry objects in each row may differ in length (e.g. polygons with variable number of vertices). These types are backed by a pyarrow ListArray.

It would also be nice to provide a more efficient representation of fixed size geometry objects. In particular, to represent a single point per row. Other use-cases would be to represent axis aligned boxes using two points.

One way to represent these would be to use pyarrow extension types backed by a fixed width binary storage type.

@jorisvandenbossche does this sound like a reasonable way to handle fixed length geometry types with pyarrow? Or would there be anything more straightforward?

jorisvandenbossche commented 4 years ago

Apache Arrow actually has a "Fixed Size List" type: https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout, which I think could be useful for this. But, this type is not yet exposed in the latest pyarrow release ..

And then, indeed a fixed width binary type is the way to do this yourself with the current pyarrow I think.

jonmmease commented 4 years ago

Ah, ok yeah. fixed size list looks like the right fit for these use cases if/when it becomes available through Python. Thanks!

jonmmease commented 4 years ago

Done in https://github.com/jonmmease/spatialpandas/pull/8

jorisvandenbossche commented 4 years ago

And for somewhere in the future (normally once you would require pyarrow 1.0), FixedSizeListArray will be available: https://issues.apache.org/jira/browse/ARROW-7261