w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
146 stars 46 forks source link

Should DCAT 3 provide its own way to indicate the query to pose to the data service endpoints? #1230

Closed riccardoAlbertoni closed 3 years ago

riccardoAlbertoni commented 4 years ago

As a side effect of the discussion on issue #1197 which arose from a Europeana use case,
I wonder if we want to provide DCAT 3 with its own way to indicate the query to pose to the data service endpoints.

That could improve the interoperability among solutions which can't or do not want to use distribution->dcat:accessURL to embed the query in a URL.

smrgeoinfo commented 4 years ago

I'm not entirely sure what you're proposing, but allowing multiple approaches to doing the same thing in an interchange format always complicates interoperability. As I understand DCAT, accessURL is a link to download a dataset as a file-based package. If a service is available to subset/filter/aggregate a view of the dataset, then dcat:Distribution/dcat:accessService would be appropriate.

aisaac commented 4 years ago

@smrgeoinfo . What we were asking is whether one could represent a situation where a dataset's "distribution" doesn't exist "independently" of the distribution (well, the access service) of a larger dataset. The way to get it is to use a specific query on that wider access service. In a way, the distribution of the dataset is the combination of the wider access service and the query to be fired at it. And this combination cannot be represented with the right granularity by the current dcat:Distribution/dcat:accessService pattern. See more details at #1197

smrgeoinfo commented 4 years ago

I think the distribution of interest is via a WebAPI, and the metadata needs to provide a machine-actionable representation of the resolvable URI template, that includes (at least via links) an explanation of the parameters in the template.

Currently:

see https://github.com/schemaorg/schemaorg/issues/2340 https://github.com/schemaorg/schemaorg/issues/2342 https://github.com/earthcubearchitecture-project418/p419dcatservices https://www.w3.org/TR/vocab-dcat-2/#Class:Data_Service (where is the URI template??) Machine Readable Web APIs with Schema.org

from https://github.com/schemaorg/schemaorg/issues/1423#issuecomment-583157131 Some other efforts along these lines: EarthCube Resource Registry, API resource type

Discussion of Machine actionable Links, somewhat dated, but covers a lot of the content information that needs to be accounted.

Example schema.org JSON-LD: CHORDS IRIS data Template for service description, using potential action

There's a lot of interest in this in the science data community!

aisaac commented 4 years ago

I'm not sure I understand. Is the idea to represent the query through the "machine-actionable representation of the resolvable URI template"? The option that we've discussed in #1197 does include schema:SearchAction and schema:query, which may be related to what @smrgeoinfo mentions. But I'm not sure, as this is a lot of links to read.

smrgeoinfo commented 4 years ago

looking at https://www.w3.org/TR/vocab-dcat-2/#dcat-scope diagram: a dcat:Distribution can have a dcat:dataService link to a dcat:DataService. I think dcat:DataService is analogous to schema.org (sdo) WebAPI. To describe the API requires a variety of information. Proposal in Schema.org is to use sdo:potentialAction/sdo:Action (which has various sub classes for different actions; important to look at Actions Overview ). These would need to be implemented in dcat:DataService.

Here is some example code (follows suggestions in schemaorg/schemaorg#2342; base namespace is sdo):

"potentialAction": [
  {
    "@type": "SearchAction",
    "name": "Query",
    "description": "query service to obtain records of seismic events",
    "result":
        {
    "@type": "DataDownload",
    "encodingFormat": ["application/xml+QuakeML",   "text/csv","QuakeML", "text/csv+geocsv",    "GeoCSV-SeismicEvent"   ],
    "description": "XML, csv, or csv fromat for seismic event following EarthCube geoWs conventions."
    },
    "target": {
    "@type": "EntryPoint",
    "urlTemplate": "http://service.iris.edu/fdsnws/event/1/query?{geographic-constraints}&{depth-constraints}&{temporal-constraints}&{magnitude-constraints}&{organization-constraints}&{misc-parameters}&{format-option}&{nodata=404}",
    "description": "URL with multiple query paramters--geographic location, event depth, time period of event, event magnitude, source network, miscellaneous parameters, formt for returned data, and what flat to use for no data.  TBD-- how to handle POST request version; need to specify the format for the POST content",
    "httpMethod":"GET",
    "uriTemplate-input": [
        {
        "@id": "urn:iris:fsdn.starttime",
        "@type": "PropertyValueSpecification",
        "valueName": "start",
        "defaultValue": "Any",
        "description": "allowed: Any valid time. Limit to events on or after the specified start time; use UTC for time zone",
        "valueRequired": true,
        "valuePattern": "(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?",
        "xsd:type": "dateTime"
        },
        {
        "@id": "urn:iris:fsdn.endtime",
        "@type": "PropertyValueSpecification",
        "valueName": "end",
        "defaultValue": "Any",
        "description": "allowed: Any valid time. Limit to events on or before the specified start time",
        "valueRequired": true,
        "valuePattern": "(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?"
        },

not all parameters shown. Parameters in the template are enclosed in braces ('{}'). See IETF RFC-6570

]
    },
  "object": {
    "@type": "DataFeed",
    "description": "list of properties  that are included in seismic event description in response documents",
    "variableMeasured": [
        {
            "@type": "PropertyValue",
            "name": "name of the variable",
            "description": "example of documentation for a varible provided in the result object",
            "propertyID": "URI for the property in some ontology",
            "measurementTechnique": "URI for the measurement protocol, or text description of procedure and sensor"
        } ,
...

... more variables in response... close

    ]   
} }
andrea-perego commented 4 years ago

Just to record that a similar requirement has been brought up in two recent use cases - see https://github.com/w3c/dxwg/issues/1240 and https://github.com/w3c/dxwg/issues/1241

dr-shorthair commented 4 years ago

Going back to the original premise of this issue, I would answer NO. That level of detail is beyond the scope of DCAT. It might be provided in an extension, developed by a community with a specific interest and expertise in documenting RDBMS APIs, but not in 'core' DCAT.

My opinion of course.

rob-metalinkage commented 4 years ago

There is no way you can predict all the forms of Web based API that will emerge and die over the lifetime of DCAT. The appropriate response is to develop a profile of DCAT to achieve interoperability for a given API and service metadata form - and if it becomes very popular people will just use the profile to the exclusion of others.

In a project I am creating a draft DCAT-QB profile - and in previous work I created a URL templating based on QB to describe the parameters in queries and templates. If some variant of OpenAPI emerges with a canonical means to describe data dimensions and queries I'd image it will become quite popular.. for a while.

riccardoAlbertoni commented 4 years ago

I agree with @rob-metalinkage and @dr-shorthair if we discuss describing the capability of the data service thoroughly. Though, I think the use case which has generated this issue (see #1197) was not asking that level of detail. My impression was that they wanted a quick solution to indicate a predetermined query (no parameters) to pose to the data services.

Before closing this discussion, I wonder if we want to provide something for the more specific case arisen by Nuno. And yes, I am playing the devil's advocate a little;)

If we consider, for example, the queries to a SPARQL endpoint, the query can be embedded in the URL attached to the dcat:downloadURL or similar in the distribution. However, I suspect there might be value in making explicit the query instead of hiding it under dcat:downloadURL or similar. We already have a way to represent the underneath endpoint (i.e., the dcat:accessURL which matches the property-chain dcat:accessService/dcat:endpointURL).

Would it make sense to minting a new property to express the query explicitly? (I have checked on https://lov.linkeddata.es/dataset/lov/terms?q=query for inspiration, there are many terms related to query, but I haven't found one that suits for this case, excepts perhaps schema:query with the text of the query attached).

I share @smrgeoinfo's interoperability worries about providing more than a way to express the same thing, but I am not sure we are duplicating to say the same here.

Could making the query explicit bring value in some maintenance scenarios, or when the same endpoint serves more datasets? Or it moves part of the efforts from the providers to the consumers without any considerable advantage?

rob-metalinkage commented 4 years ago

@riccardoAlbertoni - i think the problem is the assumption that a dataset has exactly one meaningful query - either its a very unusual case and perhaps not important for the wider community, or queries are very common in which case a single query is unlikely to work very well - too much interoperability loss by overloading the intention for related cases.

I could imagine a generic templated query object that is a qualified association whereby you can define the model profile the query uses with dct:conformsTo - and examples for things in the Web architecture 'canon' - such as URL templating and Opensearch. I have seen the void:sparqlEndpoint but never found an interoperability based use for it except in conjunction with some query model, (outside "i want a Linked Data star" or experimental "play with SPARQL" motivations) . Its not that useful to have such a scalar property in practice AFAICT, but perhaps a canonical qualified relation is probably a good thing.

nfreire commented 4 years ago

@rob-metalinkage - The use case that we have on our hands at Europeana, is about cultural heritage institutions that want to share a subcollection of their catalogue with Europeana. Their whole catalogue is available in a SPARQL endpoint, but only part of it forms the collection intended for Europeana. Cultural heritage institutions participate in several networks based on metadata aggregation, so in most cases several subcollections of the catalogs are shared, according to the context of the network.

In case it helps to understand the use case, below is the solution we adopted after the discussion in issue #1197 :

Dataset SPARQL

dr-shorthair commented 4 years ago
  1. could you add a link dcat:accessService from the Distribution to the Service?
  2. the Service could also be classified prov:Agent (software-agent)
rob-metalinkage commented 4 years ago

@nfreire this looks like a good solution - i.e. to use other vocabularies to do the job you need.

for interoperability, you should then declare a profile of DCAT for this solution, so others can follow suit ( an importantly declare that they are intending to be interoperable with your solution.)

as it happens I am writing a DCAT-PROV profile to support another project and was going to bring to this group for review. This will do half the work for you in a reusable, interoperable way. (noting that such profiles are out of scope for DXWG, but we do have a vocabulary in-scope that allows us to express them.)

@ncar and I are currently establishing a register of profiles under the new W3C approach , and am very happy to work with you to declare your solution as a reusable sub-profile of DCAT-PROV. (The long term maintenance of this register will need a community group, and it would be great to have your involvement in that !)

nfreire commented 4 years ago

@dr-shorthair :

could you add a link dcat:accessService from the Distribution to the Service? Thanks, I had forgotten to add it to the diagram. It is there now.

the Service could also be classified prov:Agent (software-agent) We are using prov:Entity because prov:Agent is not in the range of prov:used. Its range is only prov:Entity and prov:Agent is not a subclass of prov:Entity.

dr-shorthair commented 4 years ago

prov:Entity and prov:Agent are not disjoint. There is no logical conflict in classifying something as both. Yes, if the predicate is prov:used then the object is a prov:Entity but there is no reason it is not also a prov:Agent.

aisaac commented 4 years ago

@dr-shorthair re. dcta:accessService, we're adding it to the pattern, but the people we work with may not implement it and we will not require it (i.e. only note it's recommended), because it's only a SHOULD - and it's not needed for our requirements. Is it ok?

About extra PROV types, in fact I think none of that PROV typing is required. It's completely superfluous, as the types can be infered from PROV domain and ranges. These domain and ranges don't actually require the types to be present... So I would be in favor of removing everything :-)

dr-shorthair commented 4 years ago

Understood and agreed. But I thought it was worth mentioning.

nfreire commented 4 years ago

@rob-metalinkage

@NCAR and I are currently establishing a register of profiles under the new W3C approach , and am very happy to work with you to declare your solution as a reusable sub-profile of DCAT-PROV. (The long term maintenance of this register will need a community group, and it would be great to have your involvement in that !)

Thanks, that sounds interesting. We cannot work on it before the last quarter of 2020, but let's maintain contact until then.

aisaac commented 4 years ago

@dr-shorthair thanks for the understanding. And yes I think it was relevant to mention it!

andrea-perego commented 3 years ago

Going through the thread, the issue seems to have been addressed.

I propose to close it.

andrea-perego commented 3 years ago

Going through the thread, the issue seems to have been addressed.

I propose to close it.

No objection raised. Closing this issue.