w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
153 stars 47 forks source link

Drop the range of dcat:keyword #1585

Open kvistgaard opened 11 months ago

kvistgaard commented 11 months ago

Since the range of dcat:keyword is rdfs:Literal, this makes application profile designers use alternatives such as dcterms:subject which reduces interoperability with catalogues using dcat:keyword

A common SHACL shape in EU is:

:Dataset-subject
  a sh:PropertyShape ;
  sh:path dcterms:subject ;
  sh:description "The value of this property is a keyword or tag describing the Data asset. It only allows values from the EuroVoc vocabulary http://eurovoc.europa.eu/ "@en ;
  sh:name "subject"@en ;
  sh:node [
      a sh:NodeShape ;
      sh:property [
          sh:path skos:inScheme ;
          sh:hasValue <http://eurovoc.europa.eu/100141> ;
        ] ;
    ] ;
  sh:nodeKind sh:IRI ;

It would be nicer to use the dedicated dcat:keyword.

jakubklimek commented 11 months ago

Do you suggest to have a mix of literals and resources using dcat:keyword like this?

<dataset> dcat:keyword "Keyword literal"@en , <http://eurovoc.europa.eu/100141> .

If so, I do not think this will improve interoperability.

  1. Every implementation would now have to change to expect both literals, and resources, for which names would be somewhere else
  2. For your use case, there is dcat:theme, which can be used with controlled vocabularies. The difference from dcat:keyword is exactly that - keywords for free text (no controlled vocabularies) and themes for controlled vocabularies.

I think the current state is fine and we should not change that.

kvistgaard commented 11 months ago

No, I only suggest to drop the range (in fact I would suggest to drop almost all ranges and leave that to application profiles). For dcat:theme, there is a dedicated NAL http://publications.europa.eu/resource/authority/data-theme, usually one value. For keywords, always multiple values from Eurovoc, and that's is what I apply and keep suggesting.

jakubklimek commented 11 months ago

Well, dropping the range effectively means supporting the case above, which in my opinion lowers interoperability. For dcat:theme, the NAL is dedicated in DCAT-AP, not in DCAT. And, there are ongoing discussions about profiling dcat:theme in DCAT-AP: https://github.com/SEMICeu/DCAT-AP/issues/316 https://github.com/SEMICeu/DCAT-AP/issues/314

dr-shorthair commented 11 months ago

The distinction between

  1. dcat:keyword - range rdfs:Literal (datatype property)
  2. dcat:theme - range skos:Concept (object property)

has been in place since DCAT v1. If you need the value to be a term from a controlled vocabulary, denoted by a URI, use dcat:theme. If you want a text term, use dcat:keyword.

Bad habits developed in projects can't be fixed by modifying DCAT for everyone.

kvistgaard commented 11 months ago

@dr-shorthair I'm aware of the distinction being from v1. The intention of raising this issue was to improve DCAT, not to make it suitable for a particular case. And speaking of bad habits, over-axiomatazing ontologies is definitely a bad habit in RDFS and OWL modelling in general, and not reserved for DCAT. But there is hope. A handy recent example is the range of dcterms:type dropped after being like that for much longer time than dcat:keyword. So, if anything, I might be raising this issue too early, not too late.

bertvannuffelen commented 9 months ago

I support the reaction from @jakubklimek. In this case the usage situation is clear and clean, and not restrictive.

In short:

In the last case, dcat:theme is a special subproperty: namely the theme to which the Dataset is associated in the Catalogue. In this special case there is hopefully also not the discussion whether that could be a Literal. And note that for one profile the theme of another profile can be considered another categorisation.

So instead calling this a bad practice, in this case the range Literal versus Concept is corresponding to a business need. Both nicely address two distinct levels of harmonisation in the area of associating term to datasets to make them easiers findable in a catalogue by freetext search or facetted browsing.

By mixing, as illustrated by Jakub, DCAT states that the implementations must accept and being able to process both at the same time. It will create more implementation friction than gain. Lifting the distinction between data property and object property must be done care. And in this case it will not create added value, but more confusion.

Maybe you stumble over that the subproperty of dct:subject is not named 'keyword' when you use it in an implementation just as a keyword: that is a different discussion.