opengeospatial / ogcapi-features

An open standard for querying geospatial information on the web.
https://ogcapi.ogc.org/features
Other
340 stars 85 forks source link

Part 5: The limitations of JSON data types #946

Open m-mohr opened 3 months ago

m-mohr commented 3 months ago

The document says:

To use a schema for data validation, the schema must be converted into a schema representation suitable for validating data in the specific data format. For example, an XML Schema that is a GML application schema or a JSON Schema for GeoJSON or the draft OGC Features and Geometries JSON (JSON-FG). [...] The main reasons for using JSON Schema are: [...]

  • JSON data types (string, number, boolean, array, object, null) are simple and easy to understand;

I'm working on something very similar in a project called fiboa, see https://github.com/fiboa/schema My observation from this work, mapping JSON Schema to e.g. GeoParquet, is that it's not that easy as it sounds. Especially the number data type is diffiult.

fmigneault commented 3 months ago

I support this statement. JSON schema are indeed easier to understand than other representations, but sometimes that implication becomes a hindrance when mapping/piping the data to some other language/implementation, since there is insufficient information to describe the data properly.

I have had similar complex issues when mapping OGC API - Processes inputs (which can be Features) to the various CWL, WPS, JSON schema representations (see details: Weaver - Application Package - Type Correspondence).

Using JSON schema implies a heavy use of format to disambiguate types (eg: type: number, format: double). This has been highlighted on multiple occasions for OGC APIs interoperability (https://github.com/opengeospatial/ogcapi-processes/issues/427, https://github.com/opengeospatial/ogcapi-processes/issues/395, https://github.com/opengeospatial/ogcapi-processes/issues/394, etc.)

Therefore, the standard tackling the "schema" problem to properly describe data should be more explicit and rigorous regarding the recommendations it provides. A few examples of recommendations could be:

pvretano commented 3 months ago

SWG Meeting 29-JUL-2024: We discussed this issues and agreed that JSON datatypes are limited however it was pointed out that one can always use a string type with a bespoke format value to indicate how to interpret the string. So you can encode a uint64 value as a string and then set the format to uint64 to indicate that it should be interpreted as a uint64. Of course this works best if a community of interest agrees on what the format values should be.
The other option, of course, is to use the schema endpoint but negotiate something other than JSON schema that is more suited to the need.

fmigneault commented 3 months ago

The format is the appropriate solution IMO. A schema is a nice addition on top if applicable for complicated types, but it would ultimately probably need format references within it as well...

The biggest issue at the time is that all OGC APIs seem to be tossing the problem between each other, and never directly addressing this format definition. Therefore, we still lack a well established list of format references that all APIs can refer/interoperate with. There are parts of this list here and there, in drafts and issue comments, but not one clear listing in a centralized naming authority.

cportele commented 2 months ago

So you can encode a uint64 value as a string and then set the format to uint64 to indicate that it should be interpreted as a uint64.

Why set type to "string"? format is mostly used with strings, but it is not restricted to strings. I would represent this as { "type": "integer", "format": "uint64" }.

This also follows the approach of OpenAPI 3.1 which defines formats "int32" and "int64" for signed integers. See https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#data-types. We should use these as a starting point and extend this with unsigned and other bit size variants.

cportele commented 2 months ago

Meeting 2024-08-12: As a general rule, if more fine-grained sub-types of the JSON data types are needed, format will be used. In Part 5 we will include the ones from OpenAPI 3.1 and extend them with additional (un)signed integer variants. There should be a format vocabulary. @cportele will create a PR.

There is a higher level governance issue how to add additional formats in OGC API standards beyond those that will be in Part 5. This could be discussed with the other OGC API SWGs in the common meetings at the next Member Meeting.