Open m-mohr opened 4 months ago
I support this statement. JSON schema are indeed easier to understand than other representations, but sometimes that implication becomes a hindrance when mapping/piping the data to some other language/implementation, since there is insufficient information to describe the data properly.
I have had similar complex issues when mapping OGC API - Processes inputs (which can be Features) to the various CWL, WPS, JSON schema representations (see details: Weaver - Application Package - Type Correspondence).
Using JSON schema implies a heavy use of format
to disambiguate types (eg: type: number, format: double
). This has been highlighted on multiple occasions for OGC APIs interoperability (https://github.com/opengeospatial/ogcapi-processes/issues/427, https://github.com/opengeospatial/ogcapi-processes/issues/395, https://github.com/opengeospatial/ogcapi-processes/issues/394, etc.)
Therefore, the standard tackling the "schema" problem to properly describe data should be more explicit and rigorous regarding the recommendations it provides. A few examples of recommendations could be:
Listing specific format
explicitly, based on OGC https://github.com/opengeospatial/NamingAuthority/, that should be used in certain well-known cases (eg: format: geometry-point
are shown in examples, but it is not clear whether that should be interpreted as https://geojson.org/schema/Geometry.json#/oneOf/0
(type: Point
), https://github.com/opengeospatial/ogcapi-features/blob/master/core/openapi/schemas/pointGeoJSON.yaml, or some other interpretation)
Provide best practices such as reusing more explicit references (ie: prefer a narrowed reference to https://geojson.org/schema/Polygon.json
over https://geojson.org/schema/GeoJSON.json
if the type is known to be limited to polygons)
Make use of contentMediaType
, contentSchema
and $ref
to relevant entities rather than "reinventing" common structures. Which strategies should be used, and why, should provide more justifications.
SWG Meeting 29-JUL-2024: We discussed this issues and agreed that JSON datatypes are limited however it was pointed out that one can always use a string type with a bespoke format
value to indicate how to interpret the string. So you can encode a uint64 value as a string and then set the format
to uint64
to indicate that it should be interpreted as a uint64
. Of course this works best if a community of interest agrees on what the format
values should be.
The other option, of course, is to use the schema endpoint but negotiate something other than JSON schema that is more suited to the need.
The format
is the appropriate solution IMO. A schema
is a nice addition on top if applicable for complicated types, but it would ultimately probably need format
references within it as well...
The biggest issue at the time is that all OGC APIs seem to be tossing the problem between each other, and never directly addressing this format
definition. Therefore, we still lack a well established list of format
references that all APIs can refer/interoperate with. There are parts of this list here and there, in drafts and issue comments, but not one clear listing in a centralized naming authority.
So you can encode a uint64 value as a string and then set the format to uint64 to indicate that it should be interpreted as a uint64.
Why set type
to "string"? format
is mostly used with strings, but it is not restricted to strings. I would represent this as { "type": "integer", "format": "uint64" }
.
This also follows the approach of OpenAPI 3.1 which defines formats "int32" and "int64" for signed integers. See https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#data-types. We should use these as a starting point and extend this with unsigned and other bit size variants.
Meeting 2024-08-12: As a general rule, if more fine-grained sub-types of the JSON data types are needed, format
will be used. In Part 5 we will include the ones from OpenAPI 3.1 and extend them with additional (un)signed integer variants. There should be a format vocabulary. @cportele will create a PR.
There is a higher level governance issue how to add additional formats in OGC API standards beyond those that will be in Part 5. This could be discussed with the other OGC API SWGs in the common meetings at the next Member Meeting.
The document says:
I'm working on something very similar in a project called fiboa, see https://github.com/fiboa/schema My observation from this work, mapping JSON Schema to e.g. GeoParquet, is that it's not that easy as it sounds. Especially the number data type is diffiult.
type: ["number", "string", "array"]
may not be easy or impossible to represent