What is the content-model for a 'standard catalog record'?

dr-shorthair commented 4 years ago

Is this carried over from CSW-3, or is a refresh under consideration - perhaps aligned with Schema.org or DCAT-2 ?

pvretano commented 4 years ago

@dr-shorthair no, this is not carried over from CSW-3. All content models are still on the table and DCAT-2 is a prominent discussion point. There is, however, a desire not to throw CSW-3 under the bus and try to follow the same pattern set by OAPIF -- that is that an "OGC API - Records" facade can be implemented on top of a CSW-3 catalogue just like a "OGC API - Features" facade can be implemented on top of a WFS 2.X. I should say that my own opinion is that the catalogue should be agnostic of the types or resources being catalogued (dataset, services, widgets, etc.) and should provide a uniform means to (a) provide at least a minimal description of a resource and how to get to it (b) relate the resource to other resources (in and outside the catalogue) and (c) classify the resource using any number of classification schemes. My feeling is that DCAT is very dataset and services focused ... am I wrong about that?

dr-shorthair commented 4 years ago

DCAT-2 introduces dcat:Resource as the superclass of dcat:Dataset and dcat:DataService but with the explicit intention that DCAT provides a general pattern for catalogues. See https://www.w3.org/TR/vocab-dcat/#dcat-scope :

Member items in a catalog should be members of one of the sub-classes, or of a sub-class of these, or of a sub-class of dcat:Resource defined in a DCAT profile or other DCAT application. dcat:Resource is effectively an extension point for defining a catalog of any kind of resource.

In the normative clause for dcat:Resource there is a usage note

The class of all cataloged resources, the super-class of dcat:Dataset, dcat:DataService, dcat:Catalog and any other member of a dcat:Catalog. This class carries properties common to all cataloged resources, including datasets and data services. It is strongly recommended to use a more specific sub-class. When describing a resource which is not a dcat:Dataset or dcat:DataService, it is recommended to create a suitable sub-class of dcat:Resource, or use dcat:Resource with the dct:type property to indicate the specific type. dcat:Resource is an extension point that enables the definition of any kind of catalog. Additional sub-classes may be defined in a DCAT profile or other DCAT application for catalogs of other kinds of resources.

The hope is that other catalogues (of widgets, specimens, boreholes, mail-boxes, ...) can be designed using additional sub-classes of dcat:Resource. i.e. DCAT-2 provides a a generic framework for any kind of catalogue, though only Dataset and Dataservices are implemented.

mhogeweg commented 4 years ago

Has the EU GeoDCAT application profile been considered? It would help greatly with interoperability between the geo and open data communities to align with what is already being used across Europe

Marten Hogeweg Esri

cportele commented 4 years ago

Meeting 2020-02-24:

Records are often information about other things in general, not just about datasets or services, so DCAT would be a too specific information model.
DCAT is still about datasets, so dcat:Resource in the DCAT namespace would still not be a general resource and we don't want to be tied to DCAT.
The idea is mainly to agree on key properties like title, description, etc. that can be mapped to many representations. DCAT is one of those, so DCAT will be supported (but not mandated).

dr-shorthair commented 4 years ago

DCAT is still about datasets, so dcat:Resource in the DCAT namespace would still not be a general resource and we don't want to be tied to DCAT.

As I thought I explained above, while DCAT itself is about Datasets and DataServices, dcat:Resource is a starting point for sub-classing definitions for any kind of resource to be catalogued. Are there some properties of dcat:Resource that are incompatible with cataloguing the other kinds of thing that you have in mind?

cportele commented 4 years ago

My personal view (the previous comment was the summary of the discussion in the meeting, not my personal comment):

A key driver for OGC API Records is to have a simple path to sharing all kinds of resources that are related to geospatial data with a uniform approach and specific support for the discovery of these resources. This can be metadata about data or resources referenced from data. Using the Features API as a starting point, but for records instead of features as the resources, makes it straightfoward to leverage existing OGC API tooling (client and servers), too.

This has two relevant aspects for this issue:

a) The scope goes beyond the "classic" geospatial catalog use case and is not only about cataloguing metadata about other resources. This is from the charter:

The proposed Web API will define the methods and apparatus to support:

Discovery of geospatial resources by standardizing the way collections of descriptive information about the resources (metadata) are exposed;

Discovery and sharing of related resources that may be referenced from geospatial resources or their metadata by standardizing the way all kinds of records are exposed and managed.

The first item is more or less the cataloguing use case. At least this is my understanding, but I wouldn't consider myself a geospatial metadata expert.

The second item goes beyond that and is an option to share all kinds of other resources in a uniform way with some descriptive metadata for discovery and links to other resources. An example are codelists.

So, even if it is the intention of the DCAT 2 revision to position it for cataloguing everything, not just datasets, a dcat:Resource is still too narrow for the scope of Records.

Personally, I would also wait for a wider adoption of dcat:Resource by tools before using it as a basis for anything in OGC, but that is a separate discussion topic.

b) Every collection will define and publish its own queryables (and that could include everything from DCAT), but there will be a few pre-defined queryables like title/description and a general capability to have links to other resources. At least this is my understanding. But it will be a limited set, at least in the Core, and it should be straightforward to map this to all kinds of existing content models that may be used in implementations/deployments. OGC API Records should not be tied to any particular content model or encoding and implementations should be able to use whatever makes it simple to use for their users.

For example, the initial ldproxy implementation currently plans to use GeoJSON or something that is close to it as the standard encoding (in addition to HTML). We are looking into supporting JSON-LD annotations to link the JSON to existing vocabularies, but that is likely a second step.

dr-shorthair commented 4 years ago

Thanks @cportele .

I certainly don't want to suggest that DCAT provides a ready-made solution, or that a single model for 'records' is necessarily desirable or even possible. I'm way past that assumption. However, proliferation without good reason would also be unfortunate.

When designing DCATv2 we aimed to provide

a generic pattern for a Catalog and relations to records and record-metadata
dcat:Resource as an extensible base class for resource descriptions (records)
pre-defined classes for Dataset and DataService records

There was also a design decision to limit the choice of predicates to those already defined in Dublin Core as far as possible, with a relatively small set of classes and new properties in the DCAT namespace to support the Catalog backbone, plus a couple of odds and ends from PROV, ODRL and FOAF.

I'm trying to understand where this fails to meet the requirements that you describe, and in what ways you find it 'too narrow'. DCAT is an RDF vocabulary so JSON-LD serialization comes for free. And the 'few pre-defined queryables like title/description and a general capability to have links to other resources' that you want for the record-core is either already there in DCATv2, else OGC Record could be a very lightweight extension.

Concerning the requirement to support 'links to other resources', it was recognised that while DC and DCAT provides a useful set of predicates, ,there are already several (many!) independently maintained sets of link relations and types, such as ISO-19115-1 DS_AssociationTypeCodes, the IANA Registry of Link Relations, the DataCite metadata schema and the MARC relators. So two association classes were introduced to support links to (a) agents (b) other resources, where the link semantics can refer to these external definitions - see https://www.w3.org/TR/vocab-dcat/#qualified-forms .

Of course the elephant in the room is Schema.org, so if you were throwing your lot in with that I would fully understand (and probably concur). And of course the OGC legacy from CSW is very important, though there is already a corresponding DCAT profile (GeoDCAT) mapped to CSW and ISO.

ghobona commented 4 years ago

@fellahst worked on development of a Semantic Registry Information Model (SRIM) in previous testbeds. The model pulled in predicates from DCAT and other specifications.

http://docs.opengeospatial.org/per/16-059.html#_semantic_registry_information_model_srim_2

srim

cportele commented 4 years ago

@dr-shorthair - Thank you. Looking at the DCAT 2 model I guess you are correct that an entry in a codelist could also be seen as a dcat:Resource so forget the "too narrow" comment. However, that fact and the extended scope in DCAT 2 also worries me as what I liked about DCAT 1 was its narrow scope (datasets and distributions). I think this has helped it to achieve broad use (e.g. in schema.org, Google dataset search).

In my view, the DCAT 2 model of Resource is too rich for the Core of OGC API Records. I expect that there could/will be an extension that supports DCAT 2 (if there is enough implementation interest). But any implementation that wants to support the subset of the DCAT 2 vocabulary that I think will become part of the Core should be able to represent that right away in DCAT 2 JSON-LD.

On a similar note, I think that any discovery-oriented activity should consider schema.org, but again I don't see this as part of the Core of OGC API Records as a mapping or resources to schema.org will depend on the nature of the resource.

PS: Myself I am not worried about CSW compatibility, but others are, so it will happen :)

dr-shorthair commented 4 years ago

OK - I think we are converging now.

In my view, the DCAT 2 model of Resource is too rich for the Core of OGC API Records

Indeed there are a lot of properties shown in the DCAT 2 model. As noted after the caption, normal OWA applies so everything is optional and repeatable, but I understand why it would be distracting. In the diagram we showed all 'recommended' properties, where 'recommended' is merely to indicate the preferred solution for each of these aspects, and not that all of these should be present in every record. But we held back from specifying a minimum core. (consensus ...) It would be quite reasonable to propose such for an application like OGC Records - SHACL might be the right technology?

The dcat:Resource class emerged naturally by taking the common properties from the Dataset and the new DataService classes. We also had in mind the idea that there was a standard core Catalog model lurking underneath, particularly with applications like sample-catalogs in mind, so the hooks were left, but their detailed exploration was obviously out of scope.

cportele commented 4 years ago

@dr-shorthair - My view is that we should not have a dependency to DCAT in the Core, but maybe we could (informatively) document how the "core queryables" would map to DCAT, schema.org and other key vocabularies.

The DCAT profile could be documented using SHACL, but I don't think that should be in the standard, that might be a separate document.

ghobona commented 4 years ago

Experience from Testbed-12 is that if there is no common content model that is required in the Core, then we end up needing to implement 'Shim' services to transform every request. Which is not very efficient.

In my opinion, the SWG should identify one common content model as a requirement and then allow for other content models to be supported as an option.

If the SWG agrees with this point, then the next decision to make is "which content model?".

pvretano commented 4 years ago

There is a content model. It is a very simple content model; it has one class -- record. The components of the record class are:

a record identifier (id)
a type indicating the type of resource being catalogues (type)
a title for the resource (title)
a narrative description for the resource (description)
the language being used for the narrative text of the record (language)
an issued date for the date the resource was created (issued)
a modified date for the last time the resource was modified (modified)
a primary geometry for the resource (geometry)
an extents array similar to what is used in features (extents)
a set of resource-specific properties (properties)
a set of associations between this resource and other resources (links)

There is also a list of "suggested" resource-specific properties (i.e. properties that can be included in the "properties" section) that includes: publisher, keywords, themes, contact point, landing page, license, rights, downloadURL, formats, byte size, etc.

This simple content model currently lives here: https://github.com/opengeospatial/ogcapi-records/blob/master/core/code (read the short README.md).

If you want to catalogue a dataset, you create a record and include whatever dataset-specific properties you want (or that have been decided on by some authority for describing a dataset) in the properties section.

If you want to catalogue a service, you create a record and include whatever service-specific properties you want (or that have been decided on by some authority for describing a service) in the properties section.

If you want to associate the service and the dataset you include a link in each of their respective catalogue records to each other.

This content model is more-of-less what used to be in the CSW record. This content model currently has a GeoJSON encoding but other encodings (such as XML) are possible too. This simplicity is intentional. There is a very minimal set of information describing a resource that is extensible as required for a specific purpose.

Of course, this is all still being discussed in the SWG but I agree with @cportele that we minimize dependencies and keep the "core" content model as simple as possible. I would envision things like DCAT and STAC to be extensions on this core.

uvoges commented 4 years ago

Peter, One question: why didn´t you align it with GeoJSON ? Thanks, Uwe

cportele commented 4 years ago

why didn´t you align it with GeoJSON ?

I think that is the direction we discussed 2 or 3 meetings ago, but it is not yet reflected in the repo.

pvretano commented 4 years ago

@uvoges, as @cportele mentioned we agreed to do that a couple of meeting ago; I just haven't had time to update the repo to reflect those changes yet but it is on my todo list.

dr-shorthair commented 4 years ago

The core content model described by @pvretano maps directly onto DCAT - only geometry and extents are not standard properties of dcat:Resource, though dcterms:spatial is used for datasets.

Record Core	DCAT Resource
a record identifier (id)	dcterms:identifier
a type indicating the type of resource being catalogues (type)	dcterms:type
a title for the resource (title)	dcterms:title
a narrative description for the resource (description)	dcterms:description
the language being used for the narrative text of the record (language)	dcterms:language
an issued date for the date the resource was created (issued)	dcterms:issued
a modified date for the last time the resource was modified (modified)	dcterms:modified
a primary geometry for the resource (geometry)	dcterms:spatial also see spatial properties
an extents array similar to what is used in features (extents)	-
a set of resource-specific properties (properties)	(in definitions of `dcat:Resource` sub-classes)
a set of associations between this resource and other resources (links)	dcterms:relation , dcat:qualifiedRelation

As I noted, almost all are taken from Dublin Core. What DCAT provides is the class dcat:Resource and standard relations with datasets, data-services and distrbutions. It is RDF so easy to extend.

dr-shorthair commented 4 years ago

uvoges commented 4 years ago

I guess dcterms:spatial must be replaced by GeoJSON gj:geometry...

dr-shorthair commented 4 years ago

... else just have a GeoJSON geometry as the value of a dcterms:spatial property. I see no problem with that - the range of dcterms:spatial includes dcterms:Location but

(a) dcterms:Location has no content model defined

(b) if a GeoJOSN Geometry was used, the RDF entailment would merely be that a GeoJSON geometry is also a dcterms:Location which does not appear to introduce any inconsistencies or other problems

rob-metalinkage commented 4 years ago

The bigger question is whether the content model needs to be identifiable by the client ?

If the answer is no - i guess why care?

If the answer is yes - then options are: 1) Enforce a common model and expect clients to know it 2) Provide a means to state what model is being used 3) Enforce self-description of the model in-line, and define a canonical model description language

experience suggests if you choose option 1, there will be a need to profile it and discover what profile is being used, because clients will need to know how to interpret either extensions or content stuffed into generic slots.

option 2 can be achieved by adding a @context to JSON payloads - i.e. making things JSON-LD - and additional structural constraints with JSON-schema. This pushes the problem to communities to define such contexts - but things like DCAT (and GeoDCAT) suggests they are doing this already, so this means we can leverage such data models. (This makes catalogues consistent with features).

Consequently, having a core that just requires you to define the model and a profile with a specific model gives you the same initial capability, but a place to go for people who need richer metadata today - just define a profile URI and provide a JSON context document behind that URI.

lvdbrink commented 4 years ago

The Spatial data on the web best practices BP 13 specifies that

The description of datasets that have Spatial Things should include explicit metadata about their spatial extent, coverage, and representation

And a little further on, contains this explanation:

The first level of spatial description is the spatial extent of the dataset, the area of the world that the dataset describes. This often suffices for initial discovery, but further levels of description are needed to evaluate a dataset for use. These include the dataset spatial coverage (continuity, resolution, properties) as well as the spatial representation or geometric model (for example, grid coverage, discrete coverage, point cloud, linear network).

Dataset quality measures such as positional accuracy are also important for determining applicability. In the case of datasets whose spatial characteristics vary over their temporal coverage, spatial descriptions must include an explicit temporal aspect.

The spatial extent is covered in the current Records content model, but the other two I think are not. Has this been considered? It might be because only the spatial extent is really important for discovery? The other two are considered to be important for judging the fitness for use of a dataset, once found.

pvretano commented 4 years ago

@lvdbrink No, we have not discussed that but I am in strong favor with aligning with the requirements of Spatial data on the web. I'll create a new seperate issue for this so that we can track it.

dr-shorthair commented 4 years ago

These concerns are partially dealt with in DCAT using

uvoges commented 4 years ago

In case that API Records Spec should also cover the use case of a Registry (including non spatial objects like codeLists) the spatial extent, coverage... must be optional.

pvretano commented 4 years ago

Teleconference Note 23-MAR-2020:

There was one discussion point in today's teleconference that I thought I should add to this issue regarding the content model. The OGC API - Records - Part 1, Core specification does not mandate a content model. This does not mean that we are not going to define a content model. It means that, like OGC API - Features, we are not going to force people to use a specific content model and instead allow interoperability to occur as it does on the Web via content negotiation.

However, like OGC API - Feature, the records specification will define a conformance class for a record encoded using GeoJSON. Why? Because, like OGC API - Features and the recommendations of the Spatial Data on the Web paper, we want the core to be simple, developer friendly and widely usable especially by non-geo-experts. That does not mean we are excluding geo-experts. It means that we allowing experts to use the API of the catalogue but define their own content model if they like; although we would strongly urge people to implement the GeoJSON encoding if that fits their needs.

The high level view of a GeoJSON record is:

{ "id": "", "type": "Feature", "geometry": { ... }, "properties": { ... } }

The "properties" object is a generic container for any list of properties that one might want to include as part of the record (e.g. Dublin Core). The draft specification defines a list of properties that may generically be used to describe a resource which are taken from CSW 3.0/Dublin Core/DCAT ... its a bit of a mix right now and I'm sure will be discussed further.

In the CubeWerx implementation of the catalogue we have mapped RADARSAT and SENTINEL metadata into the "properties" object to illustrate the point that the properties container can contain whatever is important to describe the resource and make it discoverable.

In the "OGC API - Records - Part1, Core" draft (still under development) we have also extended the record to include a resource type property, an "extents" object and a "links" object to accommodate associations; so the overall "OGC API - Records - Part 1, Core" record structure looks like this:

{ "id": "", "type": "Feature", "@type": "<resource type", "geometry": { ... }, "properties": { ... }, "extents": { ... }, "links": {...} }

In the teleconference there was also some discussion about the relationship between the queryables and the record properties. The relationship is this, the only property names that can be used in a filter expression are the property names advertised by the catalogue service via the queryables resource (i.e. /collections/{catalogueId/queryables). That list can be a direct 1-to-1 mapping with the properties in the record but it does not have to be; it is up to the implementations to decide what the queryables should be and how they map to the properties of the record and/or their internal representation of the record. Some of the queryables, such as "id" and "@type", will be mandatory but that exact list is still up for discussion.

dr-shorthair commented 4 years ago

However, like OGC API - Feature, the records specification will define a conformance class for a record encoded using GeoJSON. Why? Because, like OGC API - Features and the recommendations of the Spatial Data on the Web paper, we want the core to be simple, developer friendly and widely usable especially by non-geo-experts.

Good. This needs to happen.

My knowledge of GeoJSON and JSON itself is fading. Is it possible to wrap a GeoJSON record with a JSON-LD context so that it could also be interpreted as a DCAT or Schema.org resource? (perhaps only partially complete in the DCAT case.)

pvretano commented 4 years ago

@dr-shorthair I believe you can. The @type key is taken from JSON-LD so I assume that other JSON-LD keys/markup would be valid too. I'll let the more JSON-LD savvy confirm that though.

cportele commented 4 years ago

@dr-shorthair

I did a quick experiment. https://tinyurl.com/umehshu is a JSON-LD playground link to a hand-crafted GeoJSON feature representing a dataset with just a few properties. The context maps some of the properties to DCAT.

For schema.org something else is needed, I think, as I don't see how to map the GeoJSON geometry to schema.org.

uvoges commented 4 years ago

@dr-shorthair You can also have a look into OGC 17-003 http://docs.opengeospatial.org/is/17-003r1/17-003r1.html#74 - Annex B2) where we created a JSON-LD context for the definition of EO Dataset Metadata.

rob-metalinkage commented 4 years ago

I'm currently looking at context documents in IoT environments based on a LD API - currently a huge "bucket of bolts" context doc is being created. I will be publishing a bunch of modular context documents for the OGC definitions server, and am happy to take on board additional contexts as required, or as emerge within specification profiles. Point is I urge a modular approach - contexts can nest.

So GeoJSON, DCAT and schema.org contexts could be combined as needed, and co-exist.

and it is not possible to map to different data structures - contexts only map elements to identifiers and types - so schema.org and geoJSON geometries are just different things - but may co-exist quite happily if you choose.

Coming up with alternative profiles (e.g. a Cat Record with a schema.org profile and a GeoJSON profile as alternative options might be appropriate.

NB there is a new specification that supports content negotiation by profile (https://www.w3.org/TR/dx-prof-conneg/) that could be used to discover and ask for different profiles from the same API endpoint. OGC API should not reinvent that wheel. (NB all the profiles needed to support OGC resource publishing will be published as resources themselves and will use this to link profile identifiers to the relevant JSON-LD context documents, or JSON-schema, or profile descriptions etc)

uvoges commented 4 years ago

Is this schema: https://github.com/opengeospatial/ogcapi-records/blob/master/core/openapi/schemas/record.yaml thought to be applicable for collection metadata AND record metadata OR just record metadata ?

The schema should not define accessURL and downloadURL but instead use a link relation with the appropriate relation type...

uvoges commented 4 years ago

...I guess I can answer myself: schemas need to be different, e.g. a collection may have queryables while a record may not have queryables

nmtoken commented 4 years ago

Possibly not the correct issue, but at the moment we are publishing ISO 19139 XML metadata through CSW for datasets that are not spatial (nonGeographicDataset ~ http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/codelist/gmxCodelists.xml#MD_ScopeCode).

We would like to use OGC API - Records to provide a JSON encoding (inter al.) but wondering whether this will be possible using
GeoJSON (as has been suggested)?

rob-metalinkage commented 4 years ago

as @cportele has pointed out JSON-LD context provide a mechanism to map elements of a JSON payload to URI definitions, and if those definitions are backed by a model - such as DCAT, then this is a mapping from a record to the model.

the implication is that such mappings to well-known models should be published in a reusable form. This is easily done by

@context: [ list of URLS that resolve to json context documents that perform the mappings ]

the SWG could publish a set of alignments to well known models - such as DCAT and ISO19139 (given an RDF encoding) - and can even maintain a register of alignments that can be added to. Making these resolve is something the OGC -NA can undertake providing URI naming policies are observed. Mappings are less easy for non-json payloads, but could be XSLT for XML for example.

Servers can describe and offer the available alignments in various different ways - https://www.w3.org/TR/dx-prof-conneg/ is an option that can be layered with almost any HTTP based API.

mhogeweg commented 4 years ago

@nmtoken - what we have done is expose our Geoportal Server catalog both with CSW (2.0.2/3.0.0) and Project Open Data DCAT.

we translate the source metadata (19139/FGDC/Gemini/INSPIRE/...) into the DCAT metadata schema and JSON encoding.

pvretano commented 3 years ago

SWG MEETING 08-FEB-2021: Closing. The content model of the records is not specified in the specification.See: https://docs.opengeospatial.org/DRAFTS/20-004.html#query-response

opengeospatial / ogcapi-records

What is the content-model for a 'standard catalog record'? #25