opengeospatial / ogcapi-records

An open standard for the discovery of geospatial resources on the Web.
https://ogcapi.ogc.org/records
Other
56 stars 26 forks source link

GeoJSON encoding of a catalogue record #59

Closed pvretano closed 1 year ago

pvretano commented 3 years ago

The purpose of this issue is to explore how the core queryables should be mapped into a GeoJSON object.

This issue was motivated by recent discussions in other issues (see issue #58) and my desire to converge on the JSON-encoding for a record. One comment that stuck with me from issue #58 was the comment from @mhogeweg: "I'm sure it is clear to all, but to me the distinction between 'applies to the record' and 'applies to the resource' is not clear." I think I agree with that assessment and so I created this issue.

I believe that the basic guiding principle should be that that fields/keys/properties related to the record are placed in the GeoJSON feature object and fields/keys/properties related to the resource being described by the record are placed inside the properties section/object.

The following table lists the core queryables from issue #42 but I have split it into two tables. One table for queryables that apply to the record and the other table for queryables that apply to the resource being described by the record.

Queryables that apply to the record: Queryable Requirement Description
recordId M A unique record identifier assigned by the catalogue.
recordcreated O The date the records was created in the catalogue.
recordmodified O The most recent date on which the record was changed.
links O A list of links for navigating the catalogue API (e.g prev, next, alt, etc.).
Queryables that apply to the resource: Queryable Requirement Description
title M A human-readable name given to the resource.
description O A free-text description of the resource.
keywords O A list of keywords or tags describing the resource.
type M The nature or genre of the resource.
language O This refers to the natural language used for textual values (i.e. titles, descriptions, etc) of a resource.
externalId O An identifier for the resource assigned by an external entity (i.e. not the catalogue).
modified O Most recent date on which the resource was changed.
publisher O The entity for making the resource available.
themes O A knowledge organization system used to classify the record.
formats O A list of available distribution formats for the resource.
contactPoint O An entity to contact about the resource.
license O A legal document under which the resource is made available.
rights O A statement that concerns all rights not addressed by the license, such as copyright statements.
extent O The spatio-temporal coverage of the resource.
associations O A list of links to resources associated with this resource.

First, let's start with a skeleton of a GeoJSON feature object:

{
   "id": "...",
   "type": "Feature",
   "geometry": {...},
   "properties": { ... }
}

The "recordId" queryable conveniently maps to the GeoJSON "id" key.

The "type" key is specific to GeoJSON and fixed so we cannot change that.

Originally we had the @type key in the feature object to denote the type of the resource being described but I have removed that and added a "type" key to the properties object following the guiding principle stated above.

Similarly the "recordcreated" and "recordmodified" queryables are mapped to the "created" and "modified" keys in the feature object.

And finally, the "links" queryable, which contains an array for links for API navigation (e.g. prev, next, alt, etc.), is mapped to a "links" array in the feature object.

The "geometry" key is fixed by GeoJSON and is the one exception to the guiding principle since this would typically be something related to the resource being described (e.g. a footprint, a bounding box, etc.).

Putting all this together the basic skeleton for a catalogue record becomes:

{
   "id": "...",
   "type": "Feature",
   "created": "...",
   "modified": "...",
   "geometry": {...},
   "properties": {...},
   "links": [ {...}, {...}, ...]
}

The remaining queryables, the ones related to the resource, are mapped to synonymously named keys in the properties object.

"properties": {
      "externalid": "...",
      "title": "...",
      "description": "...",
      "keywords": [ "...", "...", "...", ... ],
      "keywords-codespace": "...",
      "type": "...",
      "language": "...",
      "created": "...",
      "modified": "...",
      "publisher": "...",
      "themes": [ {...}, {...}, ... ],
      "formats": [ "...", "...", ... ],
      "contact-point": "...",
      "license": "...",
      "rights": "...",
      "extents": {...},
      "associations": [ {...}, {...}, {...}, ... ],
      ... any other additional properties ...
}

The complete OGC API - Records, record skeleton is thus:

{
   "id": "...",
   "type": "Feature",
   "created": "...",
   "modified": "...",
   "geometry": {...},
   "extents": {...},
   "properties": {
      "externalid": "...",
      "title": "...",
      "description": "...",
      "keywords": [ "...", "...", "...", ... ],
      "keywords-codespace": "...",
      "type": "...",
      "language": "...",
      "created": "...",
      "modified": "...",
      "publisher": "...",
      "themes": [ {...}, {...}, ... ],
      "formats": [ "...", "...", ... ],
      "contactPoint": "...",
      "license": "...",
      "rights": "...",
      "extents": {...},
      "associations": [ {...}, {...}, {...}, ... ],
      ... any other additional properties ...
   },
   "links": [ {...}, {...}, ...]
}

So far, so good but there are issues related to STAC alignment.

First, STAC does not map the various time stamps into the GeoJSON object in the same way. See issue #58 for a discussion. My proposal in issue #58 was to place all timestamps in the properties object and name them differently to distinguish which timestamps apply to the record and which to the resource. However, I am now having second thoughts about that proposal since I feel it is much clearer and cleaner to separate concerns.

The second alignment friction point is related to the STAC assets section. The assets section is more-or-less equivalent to the associations section. Following the principle that resource-related fields/keys/properties are placed in the properties object, I put the associations array in the properties object. STAC, however, places its assets object in the feature object. Since the assets object is something specific for STAC I am not sure this is an issue for records but I raise it since the more alignment the better.

Well, that's if for now. We can discuss this at the next SWG meeting but I encourage and welcome comments in this issue before then.

rob-metalinkage commented 3 years ago

One-size-fits-all schemas seem to be problematic, as shown be the frictions mentioned. GeoJSON is limited to WGS84 geometries - so using GeoJSON currently limits scope of applicability of the specification in that way at the very least.

There are a number of possible patterns that come up over and over again, particularly in metadata and citation reference schemas (because there is most pressure to interoperate here it seems): 1) Provisions of alternative schemas (supporting multiple options) 2) Profiling generic schemas for particular use 3) "Mix-in" inclusion of properties from multiple schemas

All of them are about having some flexibility in the choice of "interoperability domain".

I suggest you should reflect on the goals here and define whether you are aiming at a single shared interoperability domain, or a spec that can be applied to multiple domains, with clear mechanisms for how such domains can share common elements or mappings to allow emergence of wider domains from application specific ones.

uvoges commented 3 years ago

This looks similar to what we did for OGC EO Dataset Metadata GeoJSON(-LD) Encoding Standard https://docs.opengeospatial.org/is/17-003r2/17-003r2.html#23
Maybe you can have a look...

pvretano commented 3 years ago

16-NOV-2020: move the record-level create, modified into the properties object and rename them to record-created and record-modified. This should allow alignment with STAC or at least not collide with what STAC is doing. I'll ping the STAC folks to review the decision. We will also rename "modified" to "updated" and include in the specification a crosswalk between our core queryables and the properties used in other popular metadata standards like ISO19115:2003, DCAT (and we can extend the list if necessary).

uvoges commented 3 years ago

I checked what we´ve done in OGC 17-084r1 (OGC Best Practice which defines a GeoJSON and JSON-LD encoding of Earth Observation (EO) metadata for collections)...

We use "isPrimaryTopicOf" for the Meta-MetaData within the properties with a "type": "CatalogRecord" with JSON property names consistent with OGC 14-055r2. A JSON-LD context does is map aligned with DCAT Version 2.

Example: "properties": { "title": "Agriculture Mask - Multimission - Southern Africa", ... "isPrimaryTopicOf": { "type": "CatalogRecord", "updated": "2017-05-15T00:00:00Z", "lang": "en" },

Our JSON-LD @context does a mapping as follows:

"foaf:isPrimaryTopicOf": { "@type": "dcat:CatalogRecord", "dct:language": { "@id": "http://id.loc.gov/vocabulary/iso639-1/en" }, "dct:modified": "2017-05-15T00:00:00Z" },

Further, we do not use a "type" key in the properties ("type" in the root is used for the assignment of "Feature"). Instead we use "kind" like this:

"kind": "http://purl.org/dc/dcmitype/Collection",

with the following @context definition:

"kind": { "@id": "dct:type", "@type": "@id" },

...we have in json-ld:

"dct:type": { "@id": "http://purl.org/dc/dcmitype/Collection" },

cholmes commented 3 years ago

Interesting, I'm just seeing this now. record-created and record-updated in properties definitely works for us. I think your original proposal (created/modified) at the root is also interesting, and I suspect might even work for us as well, as we don't really have the record level created / modified clear, so could add it there. That said I support putting as much as possible at the properties level. The big reason we do that is that most all geojson parsers only understand attributes at the property level. They just ignore / don't display others. This is also why we don't do any nested structures in properties, as geojson clients also don't understand those (since most map to flat / simple feature structures). So from that perspective I think it's much, much better to have record-created, etc in the properties, so non-record / non-stac aware geojson clients will be able to easily access that information - users can see it without having to do anything special.

Interestingly your original proposal really highlighted the reason I came to the issue tracker in the first place, which was wondering if you'd all thought about how GeoJSON fits into sort and CQL. We ended up requiring a prefix of properties. on everything in properties so that we could also do filtering on id and collection (the latter being a STAC thing that we did put at the root JSON level). It seems like CQL right now perhaps assumes 'properties'? It's a bit confusing, as GeoJSON in a 'feature model' makes it clear that attributes are all in properties. So you could just treat id and geometry as 'special', and not allow any querying on values outside of properties. Or you could say that anything in the JSON can be filtered, and thus things in properties need the prefix to specify. If you went with the original proposal (created at root level) then you'd need to go with the latter approach, or there'd be no way for clients to filter/sort on the record created date.

So the question is how do you plan to have Sort and CQL interact with GeoJSON, and the fact it has attributes in properties, but id/geometry at root, and people potentially will add things at root as well. I personally think both approaches are reasonable, but it's something that came up for me as I was aligning STAC to CQL. (Happy to add this as its own issue, but it just sorta flowed from this current discussion). Would be good to have some examples that show CQL with GeoJSON objects.

pvretano commented 3 years ago

05-MAR-2021: Peter to include some text in this issue about how records (and other OGC APIs) handle this mapping of queryables to CQL and sorting expressions and then close.

m-mohr commented 3 years ago

Isn't it a bit confusing that there are two "type" fields? The GeoJSON type field (set to "Feature") and the type field in properties with different values?

m-mohr commented 3 years ago

Other than that, what are the allowed values in "license"? In STAC we only allow SPDX licenses (e.g. Apache-2.0) + various (then put multiple licenses into links) + proprietary (then put a single license into links), which may conflict. Don't see a big issue here, but want to bring it to attention here anyway.

Edit: I see now in the schema that it's a URI, but I'm wondering whether that shouldn't simply be a link then (rel type license) instead of a property?

pvretano commented 3 years ago

@m-mohr a record in the catalogue describes some resource ... a scene, a style, a feature, a widget ... pretty much anything (geospatial or not) that you want to make "discoverable". The "type" field inside the "properties" object tell you what type of resource is being describes by the record ... a scene, a style, a features, a widget ... whatever.

An OGC API Records, record can be encoded in a number of ways. HTML, ATOM/XML, GeoJSON ... The top-level type field is a requirements of GeoJSON and it has to be fixed to "Feature" ... as you know.

pvretano commented 3 years ago

@m-mohr with regard to license ... issue #99

m-mohr commented 3 years ago

Yes, understood, but my point is more: Is there a better name? One that doesn't duplicate the type name, which may get confusing? And may it just be recordType, itemType, resourceType, resource or so.

pvretano commented 3 years ago

@m-mohr ah ... yes. The name comes from DUBLIN CORE/DCAT but I would not be opposed to changing it to something like resourceType to make it more clear.
Any comments from others?

cnreediii commented 3 years ago

Careful. Consider that in the EU/EC pan-European digital transformation (and related NextGen EU) activity there has been and continues to make considerable effort in using DCAT and GeoDCAT to describe resources. One key project using a DCAT profile with some extensions is the EU Open Data Portal.

There is a formal document https://semiceu.github.io/GeoDCAT-AP/drafts/latest/ on GeoDCAT. From that document:

The GeoDCAT-AP specification does not replace the INSPIRE Metadata Regulation [INSPIRE-MD-REG] nor the INSPIRE Metadata technical guidelines [INSPIRE-MD] based on ISO 19115 and ISO 19119. Its purpose is to give owners of geospatial metadata the possibility to achieve more by providing the means of an additional implementation through harmonised RDF syntax bindings. Conversion rules to RDF syntax would allow Member States to maintain their collections of INSPIRE-relevant datasets following the INSPIRE Metadata technical guidelines based on [ISO-19115] and [ISO-19119], while at the same time publishing these collections on [DCAT-AP]-conformant data portals.

What is not stated is that there is a clear move towards using DCAT and by extension GeoDCAT in the EU. Just something to consider when determining names etc. Not that we are but we should be careful not to go cross-wire with what major communities are doing.

dr-shorthair commented 3 years ago

Terminology is currently very closely aligned to DCAT. I've added a column with the alignment here https://github.com/cholmes/ogc-collection/pull/10

pvretano commented 1 year ago

21-APR-2023: Closing. The schema for a record has been set of quite a while now. If further changes need to be made to the record schema, a new issue should be created.