Catalogue use cases - Githubissues

pvretano commented 4 years ago

This purpose of this issue is to capture use cases for the catalogue. So far we have discussed use cases informally but I want to capture them in a more persistent way and so I created this issue. Feel free to append your uses cases so that we have a record of what people think a catalogue should do.

My list of use cases includes, but is not limited to:

Catalogue as a general discovery portal. You harvest information about a wide set of resources for the purpose of making those resources discoverable. Basically, I am thinking that a catalogue can contain information about anything that you deem should be discoverable. For OGC this can include a broad swath of geospatial resources: features, coverages, service end points, processes, sensors, IoT, etc... Once that information is in the catalogue, user can search the catalogue, find the thing(s) they are interested in and then have enough information from the catalogue to access this thing(s) they found (i.e. binding).
Catalogue as a local discovery portal. This is a subset of case 1. You are a data provider and you want to publish a catalogue that lets user of your system search your offerings. This use case is more-or-less satisfies the "lets make the /collection endpoint searchable" requirement but it is also a bit more because it is not limited to what is currently findable at a /collections endpoint (i.e. not all OGC resource specifications have collections; processes for example does not but I would still want my customers to be able to discover the supper cool geospatial process that I just developed!). I have mixed feeling about how this use case would be realized; on the one hand it makes sense that the /collections endpoint can be turned into a "catalogue" but on the other hand I feel it might be better to have a parallel, dedicated, endpoint for catalogue queries. So as a provider, I would stand up my services and other resources and THEN I would also offer a catalogue endpoint (seperate from /collections) to allow you to search for what I am offering.
Catalogue as a repository/register. You have a bunch of resources (metadata documents, code lists, etc) that you want to make accessible and searchable. This use case covers the "classic" catalogue use case (i.e. I have a "bazillion" metadata documents describing stuff and I want to make this collection of metadata documents accessible and queryable). Another example is code lists such as a CRS registry.

Please feel free to comment of these use cases or, as I mentioned above, add your own.

pvretano commented 4 years ago

Use cases from 16-JUN-2020 SWG Meeting

Gobe: PubSub use case

client app subscribes to catalogue to be notified when new records are added to the catalogue
this would be an extension on the core Tom K/Chris L: Met use case
Use OAPIR as a platform of discovery for weather data Gobe:
distributed search capability
Uwe has interest
we envision this as an extension Linda:
Use cat to store provenance information as a resource that you can link to

tomkralidis commented 4 years ago

cc @efucile @6a6d74 @chris-little

As discussed at today's SWG meeting, we are looking at OGC API - Records to facilitate search/discovery and actionable links in a proposed WMO Information System (WIS) pilot. A copy of the proposal can be found at https://gist.github.com/tomkralidis/56d2c859477eb69946912b81f4653612.

uvoges commented 4 years ago

Simple and fast navigation through possibly hierarchally organized collection metadata and through hierarchally organized product/dataset/item metadata, executable by humans (browser -> html) and machine clients (-> JSON). Some collections / hierarchies may be searchable, others may not. In my opinion we need catalogue (not searchable), collection (searchable) and item/record Example (not real):

MainCatalogue (/rootCatalogue/collections) (not searchable, just browse)
- MSG_Catalogue (/rootCatalogue/collections/MSG_Catalogue/collections)(not searchable & just browse)
  - EO_EUM_DAT_MSG_HRSEVIRI (/rootCatalogue/collections/MSG_Catalogue/collections/MSG)(searchable & browse)
    - 2019 (/rootCatalogue/collections/MSG_Catalogue/collections/EO_EUM_DAT_MSG_HRSEVIRI/collections/2019)(searchable & browse)
      - MSG_4711 (record) (/rootCatalogue/collections/MSG_Catalogue/collections/EO_EUM_DAT_MSG_HRSEVIRI/collections/2019/items/MSG_4711)
      - MSG_4712 (record) …
    - 2020 (/rootCatalogue/collections/MSG_Catalogue/collections/EO_EUM_DAT_MSG_HRSEVIRI/collections/2020)(searchable & browse)
      - MSG_4811 (record) …
  - next MSG collection....... (searchable & browse)
- PolarSystem_Catalogue (/rootCatalogue/collections/PolarSystem_Catalogue) (not searchable, just browse)
  - EO_EUM_DAT_METOP_OSI-150-A (/rootCatalogue/collections/PolarSystem_Catalogue/collections/EO_EUM_DAT_METOP_OSI-150-A) (Searchable for EO_EUM_DAT_METOP_OSI-150-A products)
    - EPS_4911 (record) …
  - next Metop colection... (Searchable...)
    - EPS_5012 (record) …

e.g. EO_EUM_DAT_MSG_HRSEVIRI stands for: High Rate SEVIRI Level 1.5 Image Data - MSG - 0 degree EO_EUM_DAT_METOP_OSI-150-A stands for: ASCAT L2 25 km winds data record release 1 - Metop

rob-metalinkage commented 4 years ago

If you want navigation you should really just adopt Linked Data best practices rather than re-inventing a means to describe some micro-format and behavioural contract. Which means making JSON-LD canonical in these cases. Having predictable naming so servers can autogenerate the links seems to be the implicit requirement here, but spelling out the implications in terms of functional goals will at least make it clearer to end-users why this is specified.

uvoges commented 4 years ago

Rob, could you provide an example response to make it more clear how it could look like ...

rob-metalinkage commented 4 years ago

There are two parts to this: 1) the data - how to embed a link 2) description of what data means (the JSON-LD "context")

links are simple - { "@context": "https://json-ld.org/contexts/person.jsonld", "@id": "http://dbpedia.org/resource/John_Lennon", "name": "John Lennon", "born": "1940-10-09", "spouse": "http://dbpedia.org/resource/Cynthia_Lennon" }

but better would be:

{ "@context": [ "https://json-ld.org/contexts/person.jsonld", dbpedia: "http://dbpedia.org/resource/" ], "@id": "dbpedia:John_Lennon", "name": "John Lennon", "born": "1940-10-09", "spouse": "dbpedia:Cynthia_Lennon" }

so for collections - I think you just create a local namespace that points to the naming convention you need @context = [ "collections:": "/collections" ]

then reference them as links however you wish collection1: { "subcollection": "collections:collection2" }

and I guess collection1: { "subsubcollection": "collections:collection1/subCollection1} should work

will need a bit of experimentation to validate it all works syntactically - but you dont need to define a link mechanism - and you can make statements about any referred object just be adding it to the payload- as long as the JSON object has the "@id" : "collections:collection2" tag (for example) it can contain additional metadata.

mhogeweg commented 4 years ago

I like the 'adopt JSON-LD' view, as there is a body of work around JSON-LD that apply here. spatial is no longer special, right? however, putting some sort of namespace in the value does not seem right. If you want to distinguish between collections within a single catalog, I'd see something along these lines when using namespaces. Note, in JSON-LD you need to use an absolute URI for the collection. This should be ok as the catalog itself provides the base URL

{
  "@context": {
    "col1": "http://gptogc.esri.com/geoportal/collections/1",
    "col2": "http://gptogc.esri.com/geoportal/collections/2"
  },
  "col1:id": "1234567890",
  "col1:title": "the title",
  "col1:image": "http://manu.sporny.org/images/manu.png"
}

advertising the available collections/subcollections could be done by providing a context definition at the catalog level. Something like:

"@context": "http://gptogc.esri.com/geoportal/catalog.jsonld"

whatever this group comes up with, I'd recommend making it pass the JSON-LD checkboxes: https://json-ld.org/playground/

rob-metalinkage commented 4 years ago

+1 - but wondering why wouldnt you factor out the server location to a reusable namespace while you are at it?

mhogeweg commented 4 years ago

if you replace the http://gptogc.esri.com/geoportal with "" to indicate relative namespace for "/collections/1" you'll get an error in the validator:

jsonld.SyntaxError: Invalid JSON-LD syntax; a @context @id value must be an absolute IRI, a blank node identifier, or a keyword.

rob-metalinkage commented 4 years ago

hmm - the handling of base URIs inside contexts appears to be weird.. this doenst work:

"srv" : "http://myserver.org/", "col1": srv:collections/1",

nor does

"@base" : "http://myserver.org/", "col1": "collections/1",

but: weirdly "col1": "srv:"

will expand srv...

unless its a bug in the playground...

mhogeweg commented 4 years ago

I'd expect the 'namespace' in JSON-LD would apply to the field name, not the value. It's XML but then with {} instead of <>

rob-metalinkage commented 4 years ago

It does apply to the values - but apparently only in some cases "srv:" expands but "srv:1" doesnt - even though these definitely conform to the same rules for an IRI - whether this is an implementation bug, or a spec intention, or a spec inconsistency is going to require more investigation. Will try to get to the bottom of this but a lot of other things to do first ;-(

nmtoken commented 4 years ago

Catalogue as a general discovery portal. You harvest information about a wide set of resources for the purpose of making those resources discoverable. Basically, I am thinking that a catalogue can contain information about anything that you deem should be discoverable. For OGC this can include a broad swath of geospatial resources: features, coverages, service end points, processes, sensors, IoT, etc... Once that information is in the catalogue, user can search the catalogue, find the thing(s) they are interested in and then have enough information from the catalogue to access this thing(s) they found (i.e. binding).

contain information about anything... for us anything includes nonGeographicDataset datasets. At the moment we publish over 1300 metadata records through CSW of which 500 are nonGeographicDataset.

pvretano commented 4 years ago

@nmtoken Yes, anything includes datasets (or any resource really) that does not include a geographic component.

chris-little commented 4 years ago

@pvretano The Environmental Data Retrieval API Standard WG envisage (EDR geospatial) queries as being things that might be discoverable. I might keep wanting to request observations along a fixed trajectory, so need to persist the query as a resource and reuse, perhaps with some attributes changed (e.g. time, or desired parameter of interest). The results of each similar query could also be persisted, so that I can accumulate a succession of results to construct, for example, a climatology along that trajectory. In both these cases, these are geospatial resources, but I can easily envisage similar queries and results that are not geospatial (e.g. DNA analyses) but could use API-Records.

pvretano commented 3 years ago

2020-11-02: The todo item is to include the use cases that are relevant to the standard in the document.

pvretano commented 1 year ago

21-APR-2023: Closing. The uses cases are already articulated in the standard via the discussion about catalogue deployment patterns using the building blocks (see here).

opengeospatial / ogcapi-records

Catalogue use cases #37