opengeospatial / geoparquet

Specification for storing geospatial vector data (point, line, polygon) in Parquet
https://geoparquet.org
Apache License 2.0
780 stars 54 forks source link

Store information about planar vs spherical coordinates (eg geodesic=true/false) #9

Closed jorisvandenbossche closed 2 years ago

jorisvandenbossche commented 2 years ago

This has come up in several places (eg most recently in https://github.com/opengeospatial/cdw-geo/issues/3#issuecomment-1047795817), and brought up in a previous meeting.

Geospatial analytical systems can interpret / treat geometries' coordinates as planar or spherical. For example GEOS considers everything as planar coordinates (and thus also GeoPandas, or R's sf or PostGIS when using GEOS). Other libraries can handle spherical coordinates, such R's s2 package (which is now used as default in sf when having geographical coordinates) or BigQuery's Geography functionality (I think both based on Google's s2geometry). PostGIS also differentiates between a geometry and geography type.

Once you deal with spherical coordinates, you also have to deal with the edges of geometries. A geometry can be valid (i.e. no intersecting edges) when interpreting the coordinates as planar, but could be invalid when interpreting the geometries as spherical. And that's where the IO aspect comes into the picture. When reading in data (and working with spherical coordinates), you can either 1) assume the edges are already valid as spherical coordinates, or 2) do a conversion of planar edge to spherical edge.

For example, BigQuery assumes spherical edges when reading in from WKT (with the planar=TRUE option in ST_GEOGFROMTEXT to override this default), but planar edges when parsing GeoJSON (see https://cloud.google.com/bigquery/docs/geospatial-data#coordinate_systems_and_edges).

Quoting @paleolimbot:

There is currently no way to communicate in any file "this was exported from BigQuery Geography or S2 so you can import it there again without tessellating all the edges again" (e.g., use planar = true when importing to BigQuery).

So it would be useful to store this information about the edges in the file metadata, instead of having the user of the data to know this and specify it as an option while reading the data.

The concrete proposal would be to have an additional column metadata field to indicate this. I think a boolean flag is fine for this, and possible names are "geodesic": true/false or "planar": true/false.


Note: I am no expert on this front (GeoPandas is, for now, still only using GEOS and thus planar coordinates, so I don't have much experience with handling spherical coordinates). So please correct me if anything in the above isn't fully correct :)

paleolimbot commented 2 years ago

A good summary of the problems we faced in R when switching to st_use_s2(TRUE) as the default is here: https://github.com/r-spatial/sf/issues/1649 . I wasn't aware that these problems had been occurring already although given that BigQuery geography has been around for a while it makes sense. Since the time I started working on s2, BigQuery geography added the ability to pass planar = true, I imagine in response to similar feedback.

Cloud data warehouses seem like a case where the ability to communicate the intention of the dataset creator is particularly necessary given that more than one of them can create and consume geometries with geodesic edges.

alasarr commented 2 years ago

+1

TomAugspurger commented 2 years ago

Discussed on the call today. The majority favored a new field ~edge_interpolation~, ~edge_interpretation~, edges. This would be optional. Readers reading a geoparquet file without an edges should assume planar by default. The accepted values are

TomAugspurger commented 2 years ago

@Jesus89 is working on this.

cholmes commented 2 years ago

@Jesus89 any progress on this? This and #17 are our only remaining open issues.

Jesus89 commented 2 years ago

Yes, I'm about to launch the PR. I have a comment about using 'spheroid' instead of 'ellipsoid'.