w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
151 stars 47 forks source link

distribution > dcat:accessURL #1334

Open bertvannuffelen opened 3 years ago

bertvannuffelen commented 3 years ago

In https://www.w3.org/TR/vocab-dcat-2/#Property:distribution_access_url

dcat:accessURL matches the property-chain dcat:accessService/dcat:endpointURL. In the RDF representation of DCAT this is axiomatized as an OWL property-chain axiom.

is stated.

Why is this stated in this way, and why is it axiomatized? This brings a very strict one-on-one relationship between dataservice and distribution which I do not think is often the case.

In many services the dataservice endpointurl points to many datasets, and thus many distributions. Eg. consider https://docs.github.com/en/rest/reference all are on the same endpoint (https://api.github.com), but for each entity there is a different dataset. See https://docs.github.com/en/rest/reference/projects where there is a collection of projects per organisation and the collection of all projects. So I have for the same endpoint many access urls.

Also inversely, starting from the distribution, the accessURL for a distribution of github projects is https://docs.github.com/en/rest/reference/projects and not https://api.github.com/projects.

The above axiomatisation is maybe possible in some (selective) contexts, but it is for me more a severe restriction than a gain. Also this axiomatisation creates a binding between distribution and data service which should be clarified at the level of Distribution.

andrea-perego commented 3 years ago

Just for the records, the decision behind the specification of the property chain stems from https://github.com/w3c/dxwg/issues/124

dr-shorthair commented 3 years ago

@bertvannuffelen you may be correct that this property-chain-axiom is too restrictive. While it will be true in some cases, it is unlikely to match all deployments in practice. The statement and the axiomatization was partly pedagogy - to explain the relationships between the various URLs and elements of the DCAT backbone.

However, I wonder if you have noticed that there is also dcat:downloadURL which might be a better match to the scenarios that you are describing? dcat:accessURL is different to this - it is the URL of an API from where the Distribution may be accessed (maybe along with many others).

bertvannuffelen commented 3 years ago

Before continuing my explanation for me Distributions and Data Services are different things. So if you are combining them then you are talking about entities that are Distributions and Data Services.

For many reasons, I believe that I would try to separate them, trying to make the intersection as small as possible in the usage explanations. I understand there are communities that would maximize the intersection, but this is not my perspective.

For our discussion it should be clear, if the semantical descriptions apply to entities that are both Distributions and Data Services, or that apply to entities that are Distributions but NOT Data services.


In the context that a distribution is distinct from any data service, then dcat:accessURL is for me the generic way of getting access to the actual data. It might be a webpage in which is written that one has to phone a person and this one will send a tape.

dcat:downloadURL is the URL in which one can download the data in one click in your browser. It is a restriction on dcat:accessURL. So an application has first to consider dcat:downloadURL and then dcat:accessURL.

This last property is actually why for me a distribution is not a REST API. For a REST API I would have no intuitive semantics to give to dcat:downloadURL. It is not part of the REST terminology, nor part from the API terminology. If one talks in API design about downloads then one implements this in a different way, using different technology using different technical specifications. Very often it is even a responsability of a different team.

So personally I do think that the axiomatisation is creating confusion and questions then illustrating semantics.

Lets consider a snapshot that is also be exposed through a REST API. Then I would model the snapshot dump as a distribution (having dcat:accessURL or dcat:downloadURL) and have dcat:accessService point to the REST API. The endpoint of the REST API is then different from the dcat:accessURL. The values are not bound in any way.


In the context that an entity is expressed both as distribution and data service I do not know what the semantics are. Should and implementation take priority to the value of dcat:endpoint or dcat:downloadURL? Should they be the same? In case of differences what is the intention? I cannot explain that. And for that reason I rather stick to above context.