w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
153 stars 47 forks source link

Definition expressed machine readible #1255

Open bertvannuffelen opened 4 years ago

bertvannuffelen commented 4 years ago

The definition of Dataset

A collection of data, published or curated by a single agent, and available for access or download in one or more representations.

expresses a cardinality constraint between Dataset and Agent. Is that cardinality constraint somewhere machine readable expressed? (e.g. as OWL or SHACL)

Or is it left to the implementers to determine on which property the constraint is imposed?

dr-shorthair commented 4 years ago

We did not use much OWL axiomatization. That was a deliberate choice, since there are many usage patterns.

SHACL was only in its infancy when we started in DCATv2 so it was not one of the planned deliverables. However, for the same reason I think we would be reluctant to add much in the way of cardinality constraints.

It is highly likely that specific community implementations would want to tighten this up, but these would be 'community profiles' of DCAT.

On the specific case - if you have an example where there are multiple agent's responsible for a datasets, then I would suggest encapsulating those into a (virtual) compound-agent.

bertvannuffelen commented 4 years ago

I understand.

Now I become philosophical: it is interesting to see that there is a very strict statement in the human readable definition, but that this cannot be translated in a machine readable form without the feeling that it would constrain the usage of the term more than the human readable definition is intends.

It seems that we encounter here in a knowledge representation challenge: we agree with an intention but we cannot agree on the formal representation. So how do I can proof conformance in this case?

As I wrote, this answer is pure for the joy of the discussion. The original question has been answered.

kcoyle commented 4 years ago

The word "generally" inserted in that sentence can resolve the question.

In the Dublin Core community we have learned to prefer that vocabulary definitions, especially of vocabularies that are likely to be re-used in various contexts, should adhere to the "minimal semantic commitment" principle so that application profiles that utilize those vocabulary terms can apply their own constraints without creating incompatibilities. If you plan for a vocabulary with few constraints that is the base for APs I think you get the most out of your vocabulary.

aidig commented 4 years ago

Great discussion! It would be fantastic if there was a greater alignment between the human readable definitions and the formal machine readable expressions. It would force knowledge engineers to consider the potential of resuse of a new element and not to add restrictions unless deemed necessary for it interpretation - but also the opposite, that is, making sure that the defining characteristics are indeed present in the definition for narrowing the number of instances down to match the exact set that was intended.

In the case of the DCAT definition, it seems the minimal semantic commitment would more or less be the abstract "a collection of data" possibly adding "curated or published" in the scope of DCAT or even the definition proposed in this issue . Indeed inserting a "generally" or a "typically" in the sentence would give the users an idea of common use cases and further profiling. However I would argue stongly against making such notes part of the definition as they are supplementary and should not be part of the formal machine readable definition. Rather such notes should be added as (usage) notes, comments, examples keeping the human readable definitions consise, precise and consistant and ready for formal machine readable expression.

A definition is a single phrase that can replace the term wherever used. It does not start with an article (e.g. “a”, “the”) or end with a full-stop. It does not take the form of, or contain, a requirement or recommendation. Additional information can be included in a Note to entry or an Example.

Ref: ISO on definitions: https://www.iso.org/files/live/sites/isoorg/files/archive/pdf/en/how-to-write-standards.pdf