Add Apache Sedona to known libraries

opengeospatial / geoparquet

Specification for storing geospatial vector data (point, line, polygon) in Parquet

https://geoparquet.org

Apache License 2.0

837 stars 57 forks source link

Add Apache Sedona to known libraries #150

Closed jiayuasu closed 1 year ago

jiayuasu commented 1 year ago

Dear maintainers,

Thank you all for the great work on GeoParquet!

This PR is to add Apache Sedona to the "known libraries that can read and write GeoParquet file"

Apache Sedona 1.3.0 implements the basic GeoParquet read/write function:

read the WKB column in GeoParquet: https://sedona.apache.org/tutorial/sql/#load-geoparquet
write a table that has Geometry type to GeoParquet (with the column written in WKB format): https://sedona.apache.org/tutorial/sql/#save-geoparquet

Please feel free to let me know if there is anything I need to fix :-)

kylebarron commented 1 year ago

If geometry column name is different

var df = sparkSession.read.format("geoparquet").option("fieldGeometry", "new_geometry").load(geoparquetdatalocation1)

I don't know anything about sedona, but just wondering: are you able to read that from the parquet file metadata, instead of having the user specify it manually.

jiayuasu commented 1 year ago

@kylebarron We read metadata but a binary type column in a Parquet file could be something else other than WKB. So we leave this to the user:

Unless the column is explicitly named geometry, the user needs to tell us which column is the WKB column.

cholmes commented 1 year ago

Thanks @jiayuasu! Will merge it in.

Following up on Kyle - the geoparquet spec provides the names of all the geometry columns, in the file metadata as JSON. So it seems like you could just look at that, instead of having a user specify? You can see an example of the metadata file at https://github.com/opengeospatial/geoparquet/blob/main/examples/example_metadata.json

jiayuasu commented 1 year ago

@cholmes Got it! You made a good point. I believe Sedona should improve the reader/writer to leverage the metadata. We will make it happen. Thanks!