w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
146 stars 46 forks source link

Inconsistency between usage note of dcat:themeTaxonomy and range of dcat:theme? #1153

Closed aisaac closed 3 years ago

aisaac commented 4 years ago

The usage note of dcat:themeTaxonomy says "It is recommended that the taxonomy is organized in a skos:ConceptScheme, skos:Collection, owl:Ontology or similar, which allows each member to be denoted by an IRI and published as Linked Data.". But the range of dcat:theme is skos:Concept, so it seems that the expectation for the members of the objects of dcat:themeTaxonomy statements is a bit more precise than what the usage note says.

Future work?

dr-shorthair commented 4 years ago

Concepts are not always organized into ConceptSchemes, Collections or Ontologies. Sometimes they are free-floating (e.g. specified at run time).

nicholascar commented 4 years ago

@aisaac is correct that the range gives an expectation of use, even if in general Semantic Web work you don't have to use skos:Concepts within a skos:ConceptScheme or skos:Collection.

akuckartz commented 4 years ago

What does It is recommended mean? Is it normative or only a suggestion?

dr-shorthair commented 4 years ago

'recommended' is still normative language, but it is a lower bar than 'required'.

https://tools.ietf.org/html/rfc2119 https://www.iso.org/sites/directives/current/part2/index.xhtml#_idTextAnchor030

kcoyle commented 4 years ago

The range actually assigns all values of dcat:theme the class of skos:Concept.

3.1 rdfs:range

dfs:range is an instance of rdf:Property that is used to state that the values of a property are instances of one or more classes.

"... are instances ..." This is regardless of whether the value of dcat:theme has elsewhere been defined in a taxonomy using SKOS.

It would be interesting to understand the functional purpose; perhaps this aids in queries by class rather than by property? Presumably this was discussed long ago, so no one may remember at this point.

aisaac commented 4 years ago

Adding to what was said, and clarifying my words...

It is perfectly fine to use a (regular) ontology like Schema.org, or even a (instance-level) database like Wikidata for dcat:themeTaxonomy. And with the current axiomatization, formally it's fine to use the members of these, e.g., an OWL class from the ontology (schema:Painting), or a city from DBpedia (dbpedia:london), as the object of dcat:theme.

Yet there is indeed an expectation that what's used for dcat:theme would qualify as skos:Concept. Some users may thus be reluctant to use schema:Painting or dbpedia:london on that basis.

And even if they do, there is the formal range that kicks in: schema:Painting and dbpedia:london could be formally classified as skos:Concept by a reasoner that apply the semantics of DCAT, if they're used as objects for dcat:them. This could raise an issue for some systems (especially the one which would assume, say, the class dbpedia:City to be disjoint with skos:Concept)

Let me be clear: I am actually completely fine with that sort of flexible classification of resources as SKOS Concepts. But that sort of thing has not been called 'crossing the streams' by the SKOS group without a reason. Some people (well, especially data modelers) can be quite averse to doing it. So maybe the wording of the notes could be adapted to better reflect the situation.

kcoyle commented 4 years ago

The other possibility is that assigning the range of skos:Concept has no functional necessity and can be left undeclared, which would also allow for literals as themes, true? Because as defined it requires a URI, and some validation schemas would see a literal as an error.

aisaac commented 4 years ago

@kcoyle yes removing the range of dcat:theme (and keeping it merely as an 'expected range' in a usage note) could be an option. I'm keen on keeping dcat:theme used with URIs though. It can be very valuable to force a bit of structure here. And it's good to keep this as a difference with dcat:keyword. Otherwise implementers may wonder which one to use, leading to a diversity of usages that would probably be harmful for interoperability.

dr-shorthair commented 4 years ago

Because as defined it requires a URI

That was intentional. dcat:keyword is for literals; dcat:theme is to link to concepts that are more formally defined elsewhere, and in particular are available for re-use. This is unchanged from DCAT 2014

makxdekkers commented 4 years ago

@aisaac

@kcoyle yes removing the range of dcat:theme (and keeping it merely as an 'expected range' in a usage note) could be an option.

From my memory of the GLD WG that developed DCAT version 2014, the idea behind dcat:theme was that its super-property dct:subject was too broad because it allowed any rdfs:Resource to be a subject, including people, places and, indeed, paintings. It was felt that it was necessary to restrict the objects to just instances of skos:Concept. If we now want to remove the restriction, and allow dcat:theme to have anything as its object, it becomes identical to dct:subject and then dcat:theme doesn't make much sense as it becomes a duplicate of the same thing.

As the idea was to restrict the range of dcat:theme explicitly to skos:Concept, it also made sense to restrict the range of dcat:themeTaxonomy to skos:ConceptScheme. The relaxation of the range of dcat:themeTaxonomy to rdfs:Resource, in my mind, is not helpful -- although it doesn't hurt either.

dr-shorthair commented 4 years ago

Here is the earlier discussion: #119 #123

makxdekkers commented 4 years ago

As the idea was to restrict the range of dcat:theme explicitly to skos:Concept, it also made sense to restrict the range of dcat:themeTaxonomy to skos:ConceptScheme. The relaxation of the range of dcat:themeTaxonomy to rdfs:Resource, in my mind, is not helpful -- although it doesn't hurt either.

I note that there is no explicit dependency between dcat:themeTaxonomy and dcat:theme. It is not a requirement that all objects used in dcat:theme must be from the KOS identified by dcat:themeTaxonomy -- for example, a catalog might not even specify dcat:themeTaxonomy -- nor that terms in dcat:themeTaxonomy must be used in dcat:theme for any dataset in the catalogue -- they could also be used in dct:subject for example.

kcoyle commented 4 years ago

Thanks for pointing out the dcat:keyword property to be used with keywords.

"It was felt that it was necessary to restrict the objects to just instances of skos:Concept."

Note, however, that the result of the range skos:Concept does not restrict the objects to instances of topics previously defined as being of class skos:Concept; it has the function of inferring the class skos:Concept to any URI in the object position. (I have no idea what is inferred if the value is a string.) So if someone uses https://en.wikipedia.org/wiki/Eiffel_Tower as the object of dcat:theme, https://en.wikipedia.org/wiki/Eiffel_Tower is inferred to be a skos:Concept. The only way to restrict objects to skos:Concept is by using a validation scheme like SHACL or ShEx. That said, having the desired range be explicit in the property definition as a range is probably a better clue for metadata creators than a note on the property, but be aware that reasoners (if any are used, which often they are not) will read this differently.

This is the main reason that SHACL and ShEx were developed: RDF does no restricting, only inferring, and what metadata creators need most is some control over the content of the data, not an open world of inferences.

riccardoAlbertoni commented 4 years ago

Labelled "future-work" as for discussion in the 2019-11-05 telecon and related resolution