opengeospatial / geoparquet

Specification for storing geospatial vector data (point, line, polygon) in Parquet
https://geoparquet.org
Apache License 2.0
795 stars 56 forks source link

WKT support for 3/4D using Z and/or M #168

Closed chris-little closed 1 year ago

chris-little commented 1 year ago

In the draft Geoparquet specification, in the “geometry-types” section, the use of the “Z” suffix to indicate 3D data is mandated, and the “M” suffix for other coordinate system is not yet supported. As some OGC Standards (e.g. API-EDR v1.0.1, with V1.1 and V1.2 in the pipeline) use both suffices, perhaps clear guideline or standard could be established for combinations, such as “ZM”, “MZ”, etc., and specifically whether a preceding space is required or expected.

API-EDR has assumed that no leading spaces are expected of a suffix, but does not state an explicit requirement. Other software has assumed no spaces is the accepted syntax.

This may not seem very important, but is undoubtedly a barrier to interoperability. Perhaps alignment with this should be explicitly mentioned in the Charter, or in a workplan annex.

jorisvandenbossche commented 1 year ago

perhaps clear guideline or standard could be established for combinations, such as “ZM”, “MZ”

Since we currently don't allow M values, I suppose we will only need to add such guideline when we relax that restriction?

and specifically whether a preceding space is required or expected.

We can probably call it out more specifically for clarity / draw attention to it, but I think that the current text is unambiguous in having a space (i.e. "a " Z" suffix gets added"). I should have to look back at the original PR to see if we actually discussed this / decided on this consciously.

Do you know the rationale of other standards like API-EDR to not use any space? The WKT spec itself as referenced in our spec has examples that actually do include a space (and for example GDAL also creates WKT with spaces).

chris-little commented 1 year ago

@jorisvandenbossche The EDR API Standard WG didn't realise having a space was an option. Possibly it was caused by multi-lingual problems, as the original proposal to use the Z and M options came from Chinese speakers. Spaces have less importance in their character sequences compared to alphabetic strings.

When the issue was raised last year in the EDR API SWG, we looked at current practice, and there was no clear guidance. We agreed that no spaces would make for slightly easier unambiguous parsing. We also unconsciously assumed that Z always came before M. And we consciously chose M to always mean a time coordinate.

The OGC original spec contains: 7.2.6 Examples Examples of textual representations of Geometry are shown in Table 2. The coordinates are shown as integer values; in general they may be any double precision value. Note The examples of POINTZ, POINTM, and POINTZM at the bottom of Table 6. This same style for distinguishing 2D points from 3D points and from 2D or 3D points with M value can be applied to LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, and GEOMETRYCOLLECTION types.

But then the Table 6 of examples has Point Z, Point ZM and Point M !

So clear guidance is probably needed.

chris-little commented 1 year ago

A concrete reason for preferring no spaces before Z or M is that in the EDR API, the WKT is part of a URL, and we felt that having spaces, which render as %20, was messy.

jorisvandenbossche commented 1 year ago

Can you point to where this parameter is described in the EDR API? (to better understand the use case)

chris-little commented 1 year ago

The latest draft version, V1.1 Sections 8.2.5, 8.2.6, for the Trajectory and Corridor queries. V1.0 and V1.0.1 are the same, though with some mistaken examples.

jorisvandenbossche commented 1 year ago

Ah, so in the EDR API case that's about actual WKT reprs of full geometries, not just geometry type indications. Note that in our metadata, this is about the geometry type (so it's not actually WKT, the geometries are stored as WKB)

Looking at https://github.com/opengeospatial/geoparquet/issues/41#issuecomment-1077546595 and comments below that, it seems we went with the current naming scheme based on the types from GeoJSON, and then decided to add "Z" with a space (and not without a space) somewhat randomly (although using the fact that typical WKT strings are using a space as prior art).

chris-little commented 1 year ago

And as we are embedding the WKT in URL queries (HTTP(S) GET and POST, saving a few spaces (or %20s) is useful.

rouault commented 1 year ago

saving a few spaces (or %20s) is useful.

you're saving just one. Every following ordinate needs to be space separated

The OGC original spec contains:

"Table 7: Integer codes for geometric types" in "8.2.3 A common list of codes for geometric types" has spaces in the type names: "Point Z", etc. Actually it seems that the only place in the simple features spec where no space is found is in table 6 (in a WKT context at least. When looking at WKB, there are C-style structures and enumerations that have no space, but that's because of the constraint of an identifier being a single word). Looking at the BNF of WKT, it seems to me that no space isn't even allowed, even if a number of WKT parsers (at least the ones in PostGIS and GDAL), accept WKT types both with or without space (speaking here about when ingesting a full WKT geometry)

jorisvandenbossche commented 1 year ago

And as we are embedding the WKT in URL queries

Yes, but the point I was trying to make is that we don't use WKT at all in the GeoParquet spec. The discussion about what the WKT spec says about spaces is certainly interesting (and relevant for EDR API) and useful to resolve, but thus not that relevant for GeoParquet (and probably should be held elsewhere so the relevant people see it?)

If the WKT spec would be unambiguous, that would be a useful data point for us to decide whether we want to use a space as well or not. But in the end, our spec (and json schema) for the geometry type is currently explicit in having a space.

chris-little commented 1 year ago

@jorisvandenbossche @rouault So I am content that you have addressed the issue for GeoParquet, and highlighted the ambiguity in the WKT spec.