opengeospatial / geoparquet

Specification for storing geospatial vector data (point, line, polygon) in Parquet
https://geoparquet.org
Apache License 2.0
795 stars 56 forks source link

Geometry types #133

Closed tschaub closed 1 year ago

tschaub commented 1 year ago

Currently, the geometry_type metadata property can be a string or an array of strings. I'm wondering if this flexibility is really necessary. When parsing this metadata into a struct, it can make things simpler to have a single type for the value.

In addition, instead of the string "Unknown", an empty array could represent unknown geometry types.

If an array sounds good, I think the name geometry_types also makes sense. Basically, I'm wondering if there would be support for this instead:

{
  "geometry_types": {
    "description": "The geometry types of all geometries, or an empty array if they are not known.",
    "type": "array",
    "items": {
      "type": "string",
      "pattern": "^(GeometryCollection|(Multi)?(Point|LineString|Polygon))( Z)?$"
    }
  }
}
jorisvandenbossche commented 1 year ago

In addition, instead of the string "Unknown", an empty array could represent unknown geometry types.

Some discussion about "Unknown" happened in https://github.com/opengeospatial/geoparquet/issues/41. Although we mostly discussed about whether we needed the concept (versus making the field optional), not how to indicate it (as "Unknown" or []).

Personally I find "Unknown" a bit more explicit (and maps to what eg GDAL and flatgeobuf do), but if we go with only allowing an array of values, it's certainly the easier specification.