Open paleolimbot opened 1 year ago
Hi @paleolimbot!
Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can! In the mean time, feel free to add any relevant information to this issue.
Thanks for the feature request. @paleolimbot where is the CRS in the example?
It's a property of the (Arrow) type!
from geoarrow.pyarrow import io
tbl = io.read_pyogrio_table("/vsizip/vsicurl/https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_point.fgb.zip")
tbl["wkb_geometry"].type.crs
#> '{"$schema":"https://proj.org/schemas/v0.7/projjson.schema.json","type":"Projected...
The full serialization of the type is described in the 'extension types' section ( https://github.com/geoarrow/geoarrow/blob/main/extension-types.md ), and you can access the it using type.__arrow_ext_serialize__()
(e.g., tbl["wkb_geometry"].type.__arrow_ext_serialize__()
above). (The CRS is the main thing that's in the serialization)
Hey @paleolimbot ! Thanks for the update. I've been following your geoarrow work for a long while and am pretty excited to integrate it. I wrote a simple wrapper a few months ago before geoarrow.pyarrow
that pulled the offset buffers and was able to construct cuspatial data from it easily and fast. We will definitely be integrating your work. Is it available as a dependency in pip, yet?
Is it available as a dependency in pip, yet?
Yes! pip install geoarrow-pyarrow
should do it. I have the lower-level geoarrow-c
on conda-forge and will submit the PR to add geoarrow-pyarrow
in the next few days.
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem you would like to solve.
Now that geoarrow-pyarrow ( https://github.com/geoarrow/geoarrow-python ) is available and the GeoArrow specification has an initial 0.1 release, there are potential synergies we may be able to leverage given the common memory layout! Basically, geoarrow-pyarrow implements a
pyarrow.DataType
subclass for geometry with a type-level place to store the coordinate reference system. It would be very cool ifcudf.Series.from_arrow()
could handle these (or whatever the best interface is from your end).I also think it has the potential to significantly speed up IO from the current
geopandas.read_file()
+cuspatial.GeoSeries.from_geopandas()
(rough estimate from some musings below assembled linestrings from a large ish FlatGeoBuf about 20x faster).Happy to implement anything in geoarrow-c or geoarrow-python that makes this easier! We're slowly working on getting both on conda-forge (they're on pip already).
Describe any alternatives you have considered
The closest thing that currently provides this functionality is
from_geopandas()
, with Shapely's to_ragged_array and from_ragged_array also providing similar buffer building/parsing capability.Additional context
Some musings with a large-ish linestring dataset (with apologies if I'm missing some obvious usage I should be aware of):
There are more example datasets at https://geoarrow.org/data as well (although I'm sure you have many internally as well).