opengeospatial / ogcapi-records

An open standard for the discovery of geospatial resources on the Web.
https://ogcapi.ogc.org/records
Other
59 stars 28 forks source link

Is this really the right way to go? #209

Closed p1d1d1 closed 1 year ago

p1d1d1 commented 1 year ago

Although I am a fan of the new OGC APIs, I must admit that I am astonished by the specification for "Records." IMHO here OGC is perpetuating the mistake of separating data from metadata. Similar to ".../collectionId/map" or ".../collectionId/tiles," I would have expected something like ".../collectionId/metadata." Indeed, it seems to me that there is even no need for any new metadata-specific request, as one could have adapted the data model of ".../collectionId" to contain the metadata properties required by "Records."

pvretano commented 1 year ago

The OGC API resource tree defines a bunch of endpoints that a really "catalogues". The .../collections endpoint is one example of such an endpoint. The /processes endpoint is another ... it provides metadata describing each deployed process.

Records lets the individual standards define the endpoints that provide metadata. Features and common define the /collections endpoint to provide metadata about each collection. Processes defines the /processes endpoint to provide metadata about each process .... and there are others.

What Records does instead, is define a set of build blocks (i.e. query parameters and additional metadata properties) that can be implemented at these "metadata" endpoints in the OGC API resource tree to make those endpoint searchable as mini-catalogues of local resources. Doing this allows queries like "Give me the list of collections that lie within a specific bounding box and contain the keyword 'seabed'." to be resolved. The URL for this example would be something like this: .../collections?bbox=10,10,20,20&q=seabed

There is an entire requirements class in Records called Local Resources Catalogue that described this ...

There is also a presentation here: https://1drv.ms/p/s!Ati-J6Wz5l-biAr4ue9yEWWnsUOc?e=pUS96q

Records also has a Crawlable Catalogue class of which STAC is an example .

... and yes, Records also supports the "classical" catalogue approach (i.e. a catalogue of metadata usually separate from the data). I agree this is a misguided approach but Records has to support it because there is a huge install base of such catalogues and we want them to play in the OGC API sandbox.

ByronCinNZ commented 1 year ago

Hi Peter,

To label the classical catalogue as a “misguided approach” I would argue is a bit strong. There are many reasons one may wish to abstract the metadata into a collection separate from the resources being described. Most of these situations are analogous to a “product catalogue” to support a shopping experience like looking for a house or shopping on Amazon. Yes, metadata should reside close to the data whenever possible (but there are many situations where this does not suit, e.g. spatial views, data created on the fly) and it should be able to exist separately as a traditional catalogue for management and other purposes. The is a both and situation. What is important is that the point of truth metadata is clear and this would most naturally be alongside the resource wherever possible.

Cheers, Byron

On 21/02/2023, at 7:20 AM, Panagiotis (Peter) A. Vretanos @.***> wrote:

The OGC API resource tree https://www.pvretano.com/Projects/ogcapitree/ defines a bunch of endpoints that a really "catalogues". The .../collections endpoint is one example of such an endpoint. The /processes endpoint is another ... it provides metadata describing each deployed process.

Records lets the individual standards define the endpoints that provide metadata. Features and common define the /collections endpoint to provide metadata about each collection. Processes defines the /processes endpoint to provide metadata about each process .... and there are others.

What Records does instead, is define a set of build blocks (i.e. query parameters and additional metadata properties) that can be implemented at these "metadata" endpoints in the OGC API resource tree to make those endpoint searchable as mini-catalogues of local resources. Doing this allows queries like "Give me the list of collections that lie within a specific bounding box and contain the keyword 'seabed'." to be resolved. The URL for this example would be something like this: .../collections?bbox=10,10,20,20&q=seabed

There is an entire requirements class in Records called Local Resources Catalogue https://docs.ogc.org/DRAFTS/20-004.html#clause-local-resources-catalogue that described this ...

There is also a presentation here: https://1drv.ms/p/s!Ati-J6Wz5l-biAr4ue9yEWWnsUOc?e=pUS96q

Records also has a Crawlable Catalogue https://docs.ogc.org/DRAFTS/20-004.html#clause-crawlable-catalogue class of which STAC https://stacspec.org/en/about/stac-spec/ is an example .

... and yes, Records also supports the "classical" catalogue approach (i.e. a catalogue of metadata usually separate from the data). I agree this is a misguided approach but Records has to support it because there is a huge install base of such catalogues and we want them to play in the OGC API sandbox.

— Reply to this email directly, view it on GitHub https://github.com/opengeospatial/ogcapi-records/issues/209#issuecomment-1437403187, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIJCPHYDLNBCMJGUHIZC53WYOYWDANCNFSM6AAAAAAUVEYGFU. You are receiving this because you are subscribed to this thread.

pvretano commented 1 year ago

@ByronCinNZ you are right ... misguided is a bit strong ... and in general I agree with you. Bad word choice on my part! One of the main points I was trying to make -- badly it seems -- is that the intent of Records is to accomodate both use cases.

p1d1d1 commented 1 year ago

Wouldn't it have been sufficient for "classical catalogs" to have a link with rel=describedby at the collection level pointing to the "external" metadata record?

tomkralidis commented 1 year ago

There are numerous use cases where datasets are not available/do not fit an OGC API publication/dissemination model (example: providing a record describing a dataset which provides a Pub/Sub endpoint, or a link to a compressed archive download).

p1d1d1 commented 1 year ago

for that you can use STAC, don't you?

tomkralidis commented 1 year ago

The record model in OGC API - Records is designed as a superset of STAC (datasets, styles, services, processes, etc.).

p1d1d1 commented 1 year ago

In the Spatial Data on the Web Best Practices there is a nice chapter (11) explaining why traditional SDIs are not enough. Stuff like https://demo.pygeoapi.io/master/collections/dutch-metadata/items is IMHO reiterating those same issues, or am I wrong?

pvretano commented 1 year ago

@p1d1d1 previous iterations of the OGC catalogues were based on the old "web service" architecture which exposed an RPC-like interface. In order to get anything out of those services you needed to know what parameters to add to a base URL (e.g. request=GetRecords). Search engine crawlers did not contain that knowledge and so could not index those services. OGC APIs (including OGC API records), on the other hand, follow standard Web API rules of which search engine crawlers are aware and so they can index these services. OGC API Records responses (and OGC API responses in general) include opaque links to previous and next records, links to different represenations of the record, links to the data the record is describing etc. All a client (including a search engine crawler) needs to do is follow the links.

As a side bar, many OGC APIs (like records and features) also include a "query" API (e.g. CQL2) that advanced clients can use to query for records/resources that satisfy certain criteria ... but this query API is not necessary to navigate the catalogue and ultimately navigate to the resource (i.e. the data) that a record describes.

Furthermore, the OGC API records specification defines building blocks that can be used to deploy different catalogue patterns. One of the those patterns is the traditional SDI catalogue pattern ... I know you are not a fan of that one but there are still a lot of catalogues out there that use this pattern and we would like them to play in the OGC API space. Another pattern is the static or crawlable catalogue ... in this pattern the metadata can live with the data and via links in each record you can crawl from one record to the next and having found a record you can then link to the data that record is describing. No special API knowledge is required ... just follow the links. Here is a link to a presentation about this "building blocks" idea.

p1d1d1 commented 1 year ago

@pvretano, again:

Wouldn't it have been sufficient for "classical catalogs" to have a link with rel=describedby at the collection level pointing to the "external" metadata record?

IMHO stuff like https://api.weather.gc.ca/collections/wis2-discovery-metadata doesn't bring anything. But again, just IMHO.

pvretano commented 1 year ago

01-MAY-2023: This does not seem to be an actionable issue so closing. Can be reopened if some action is required.