w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
139 stars 55 forks source link

Use of dct:type with both Class and Concept #314

Closed lvdbrink closed 3 years ago

lvdbrink commented 5 years ago

In DCAT revised edition, classifying-dataset-types, dct:type is used with a range of skos:Concept, while the range of dct:typeis formally rdfs:Class as defined in Dublin Core.

In the examples, both rdfs:Classesand skos:Conceptsare used as object. While this may not be wrong per se, a consequence of this is that OWL-Full comes into play. Every instance of a skos:Conceptbecomes a rdfs:Classas well.

I'm not sure if this is intended by the DCAT editors?

I see two solutions:

Thanks to @architolk for pointing this out.

rob-metalinkage commented 5 years ago

Really interested in the take of others here - there are a couple of ways of looking at this - but I have never been able to find a cogent argument why we shouldnt assume that a rdfs:Class is actually a type of skos:Concept - classes are nothing more that concepts that define sets of instances.

It seems to be explicitly supported in OWL 2 as "punning" [https://www.w3.org/2007/OWL/wiki/Punning]

It seems a perfectly natural Use Case to me to model skos:Concepts as types in systems, then generate rdfs: and owl Class models only if and when we need to model additional behaviours.

skos:Concept is relevant for "soft-typing" and rdfs:Class for hard-typing - and the equivalence is actucally a useful thing.

Is OWL-Full really a problem? I think not for several reasons: 1) I dont see evidence that OWL-DL (or any other flavour) inferencing is happening at run-time across distributed systems - all "semantic web" implementations I have see cache any models they want to use. 2) There is currently no way of telling a client, not specification, constraining referenced models to be any particular profile of OWL - so no assumptions can be made anyway 3) With negotiation-by-profile the same concept can be served with a graph that conforms to SKOS, RDFS, OWL-DL, OWL-Full, SHACL and any other specific profile needed by a client.

IMHO there is a need to provide a specific example of the problem and why its a problem, and how to handle the use cases of soft-typing.

My feel here, although I can't prove it, is that negotiation by profile and OWL 2 punning are two sides of the same coin - implementation and theory, and essentially we can get out of the bind by the URI dereferencing architecture - OWL-DL reasoners can ask for the metaclass representation they want.

Default behaviour for OWL - i.e. if a client asks for OWL perhaps an OWL-DL representation SHOULD be returned. I dont know if the profile guidance or content negotiation scope will allow us to go into this platform specific detail - or where and who in W3 cares about the general architecture of distributed OWL models?

dr-shorthair commented 5 years ago

'Hard' typing just means using rdf:type for classification. 'Soft' typing means using anything else (e.g. dct:type).

The range of rdf:type is rdfs:Class and standard RDFS entailments mean that an individual is also a member of the super-classes of the asserted classifier.

As @lvdbrink points out, the range of dct:type is also rdfs:Class, but no other entailments follow.

The use of either rdf:type or dct:type entails that the value is an rdfs:Class regardless of whether it was originally declared as such - so if it was defined as a skos:Concept it also becomes a rdfs:Class.

The use of any predicate other than rdf:type for classification has no RDFS significance, but nevertheless might be given significance in a particular application.

This doesn't say anything y'all don't know already, but maybe puts it in perspective.

dr-shorthair commented 5 years ago

In an email that was not reflected into this issue, @kcoyle pointed out that the SKOS editors explicitly declined to introduce a constraint that a skos:Concept may not also be a Class:

"3.5.1. SKOS Concepts, OWL Classes and OWL Properties

Other than the assertion that skos:Concept is an instance of owl:Class, this specification does not make any additional statement about the formal relationship between the class of SKOS concepts and the class of OWL classes. The decision not to make any such statement has been made to allow applications the freedom to explore different design patterns for working with SKOS in combination with OWL."

https://www.w3.org/TR/2009/REC-skos-reference-20090818/#concepts

So I'm inclined to accept their (the SKOS editors) invitation and go with the flow - i.e. no change required to DCAT or Dublin Core because of dct:type entailments.

rob-metalinkage commented 5 years ago

+1

Is it a DCAt profile guidance issue however to note that use of skos:Concept is fine for dct:type, but by doing so you are explicitly accepting OWL punning, and if you need to keep class and instance models separate you probably need content negotiation by profile?

jakubklimek commented 5 years ago

@lvdbrink I would advise against considering usage of dc:type instead of dct:type as the whole dc elements namespace was deprecated by dc terms for use in Linked Data.

IMHO there is a need to provide a specific example of the problem and why its a problem, and how to handle the use cases of soft-typing.

@rob-metalinkage A specific example of a problem would be an inference enabled Linked Data visualizer (or repository such as RDF4J). A typical discovery query is asking for all rdfs:Class and owl:Class instances in a SPARQL endpoint to see what data is there. With inferencing enabled, all instances of skos:Concepts used in a DCAT-rev data catalog to categorize datasets using dct:type could be unintentionally returned as instances of rdfs:Class with no actual instances (those would use rdf:type) causing all kinds of confusions.

I would say that in this case, using dct:type is not worth it just for the sake of reusing an existing property, due to these unintentional side effects. I would suggest either

  1. Explicitly say that the dataset categories are rdfs:Classes, and they should be used as rdf:types
  2. Say that the dataset categories are not implicit rdfs:Classes, and use another property for their attachment, with no "concealed" side effects such ass OWL punning

The fact that as of now, the group does not see an immediate problem does not mean that this will be OK in near future, where, e.g. automated inferencing could become more spread. IMHO better to be safe and unless the inference is intentional (which it is not in this case), I would steer clear of it.

kcoyle commented 5 years ago

I don't know if this helps or hurts in this particular situation, but note that the DCMI community is on the verge of revising DC Terms to move from the standard RDF "range" definitions to a schema.org-like use of "expected values". This means that the stated "range" would no longer be suitable for inferencing (IMO) but instead serves the role of conformance. I believe the upshot of this is that all properties with "expected values" would be annotation properties.

This has not yet been entirely agreed, but is currently under discussion.

rob-metalinkage commented 5 years ago

@jakubklimek - I think you have summed the underlying issue up with " A typical discovery query is asking for all rdfs:Class and owl:Class instances in a SPARQL endpoint to see what data is there." - we have seen such patterns across a range of platforms - where generally there seems to need to be an implicit contract that the client has some sense of the scope and size of a dataset before issuing a query.

I would suggest: 1) you could make a tighter query if you only wanted classes with instances - losing expressivitiy without a strong driver and a documented "contract" between producer and consumer is a poor trade-off 2) you should document whats behind an endpoint - "discovery queries" are unsafe (I'm used to spatial and observation data where unbounded discovery queries could return petabytes of data just as easily)

(I go to a bit of detail here, because this is generally relevant to the drivers for DCAT - ability to describe whats in a dataset and accessible via its end point. My opinion is that if we wanted to consider such an architectural constraint we would need to have a formal documented use case that we can discuss and accept as within scope. I'd be fascinated to see a compelling Use Case for client discovery and access of content starting with, as per your example, an explicit contract that a class model is available at all. If we can see a workable approach it would probably inform us as to the bare minimum MUST have metadata that a dataset description needs to support such an architecture. )

architolk commented 5 years ago

Apart from inference considerations, the immediate issue we ran into (and actually the way we found out that something fishy was going on), was the way that Topbraid Composer uses the rdfs:range assertions for determining the list of available values while editing a new resource of a particular (rdf:type) class.

For example, let's say that we would like to create a new asset (e.g. of rdf:type adms:Asset), and let's say that we would like to add a triple with this new asset as subject, and predicate dct:type. Topbraid will then give us a list of possible values for the object, containing resources of (only) type rdfs:Class. There is no option to select a resource of type skos:Concept, what we would like to do in this case (according to the adms profile).

The only option we have, is to manually change the dct ontology, removing the rdfs:range assertion, or changing it accordingly (and of course we could make our skos:Concepts of type rdfs:Class, but that is not something we want to do - we would like our concept schemes separate from our ontologies).

Although using something like SHACL is more appropriate for deriving user interfaces from RDF models IMHO, current state of business is that editors (like Topbraid) will look for rdfs:domain and rdfs:range assertions, as we found out the hard way.

Marco

pmaria commented 5 years ago

As @architolk states, in our (Dutch government related) case, we apply a modeling approach in which we maintain a clear distinction between instances of skos:Concept and instances of rdfs:Class. For us a skos:Concept represents a unit of thought, and a rdfs:Class a set of particular things which may or may not contain manifestations of a unit of thought. We do this for a variety of reasons, one of which is to keep the door open for OWL-DL reasoning, should we, or our consumers, wish to apply it.

IMHO a fundamentally important standard/recommendation like DCAT shouldn't "force" a more complicated reasoning pattern onto its users. At least, not without a very good reason.

The question for use cases is understandable, however it's quite hard to imagine upfront what data consumers will want to do with the data. We really do not know. That's why we strive for an open modeling approach that shuts as little doors as possible.

Therefore, I strongly agree with the suggestions that @jakubklimek makes above. And I have a strong preference for his second suggestion.

dr-shorthair commented 5 years ago

There is clearly a tension here between

SKOS had hedged its bets until now. DC started soft, then veered into stronger semantics with DC-TERMS, but @kcoyle is now reporting a plan to revert to a softer position now aligned with schema.org.

Since it used both SKOS and DC-TERMS, I had assumed that DCAT fell into the soft semantics camp. An implication of that is that anyone who wants stronger reasoning must be selective about what is loaded, and maybe also cull the graph of things like dct:type rdfs:range rdfs:Class . prior to inferencing.

@jakubklimek, @architolk and @pmaria appear to now be advocating that we take DCAT into a stricter direction. Given its heritage and installed base this could be a significant change so we need to be clear about potential side-effects and be careful about these.

(FWIW @architolk I don't think we can be strongly influenced by the behaviour of a single IDE like TopBraid. I'm a TopBraid user myself, but am aware that it makes a bunch of assumptions not all of which are useful and which are not necessarily aligned with the broad community understandings.)

rob-metalinkage commented 5 years ago

There is nothing that forces anyone to resolve a dct:type object reference and do any inferencing over it.

If you chose to load both the dcterms RDFS model and resolve the dct:type reference and find the RDFS model for the referred object, then you would naturally have to accept the intention of the data publisher (not the DCAT specification) that any skos:Concepts are indeed "units of thought" that represent (and entail) rdfs:Class

Distributed reasoning means there must be sophisticated contract that actually makes URI references to objects, intrinsically as instances of things, link to a class model.

TopBraid, for example, has a bunch of built in assumptions that some graphs are loaded - and this is a mixture of explicit OWL imports, (perhaps also TBC controlled imports smuggled into comments in TTL files?) , and reflection based on "magic" patterns in file names in projects, whose Eclipse UI controlled open-or-closed state determines if they are loaded. Its kind of horrible, but seems a fair reflection of the reality that the application context is responsible for determining the graph to reason over, and any entailment regimes.

So, I dont see a reason why the project discussed cannot make its mind up that all references must have a rdfs:Class axiomitisation, and that resources resolved from URIs they use are constrained to be OWL-DL. I think to flip the argument on its head, it seems unwise to force such assumptions on everyone. DCAT users should be free to use whatever reasoning and entailments they choose.

That said, the existence of examples that do implicity rely on (perfectly legal) OWL punning interpretations should carry an explanation that some examples do assume an environment where punning between skos:Concepts and rdfs:Class is allowed, and that DCAT itself does not dictate this either way.

architolk commented 5 years ago

I totally agree with the statement "DCAT users should be free to use whatever reasoning and entailments they choose". I do have some concerns with regard to using terms from other vocabularies and the question of committing to the same ontological commitment as the original vocabulary (e.g.: committing to the assertions that are part of the vocabulary definition, like the rdfs:range assertion). It seems to me that you should, or else use something more appropriate - or create your own terms.

It's very interesting that the Dublin Core community is leaning to a more soft approach. This would indeed resolve the issue, and make the use of dct:type less problematic.

As a DCAT user ourselves, two solutions seem reasonable:

With respect to the use in Topbraid, I agree that this is not a very strong argument, it's simply one of the examples users might (and in our case - will) run into.

Marco

On 2018-08-27 03:46, Rob Atkinson wrote:

There is nothing that forces anyone to resolve a dct:type object reference and do any inferencing over it.

If you chose to load both the dcterms RDFS model and resolve the dct:type reference and find the RDFS model for the referred object, then you would naturally have to accept the intention of the data publisher (not the DCAT specification) that any skos:Concepts are indeed "units of thought" that represent (and entail) rdfs:Class

Distributed reasoning means there must be sophisticated contract that actually makes URI references to objects, intrinsically as instances of things, link to a class model.

TopBraid, for example, has a bunch of built in assumptions that some graphs are loaded - and this is a mixture of explicit OWL imports, (perhaps also TBC controlled imports smuggled into comments in TTL files?) , and reflection based on "magic" patterns in file names in projects, whose Eclipse UI controlled open-or-closed state determines if they are loaded. Its kind of horrible, but seems a fair reflection of the reality that the application context is responsible for determining the graph to reason over, and any entailment regimes.

So, I dont see a reason why the project discussed cannot make its mind up that all references must have a rdfs:Class axiomitisation, and that resources resolved from URIs they use are constrained to be OWL-DL. I think to flip the argument on its head, it seems unwise to force such assumptions on everyone. DCAT users should be free to use whatever reasoning and entailments they choose.

That said, the existence of examples that do implicity rely on (perfectly legal) OWL punning interpretations should carry an explanation that some examples do assume an environment where punning between skos:Concepts and rdfs:Class is allowed, and that DCAT itself does not dictate this either way.

-- You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [1], or mute the thread [2].

Links:

[1] https://github.com/w3c/dxwg/issues/314#issuecomment-416091796 [2] https://github.com/notifications/unsubscribe-auth/AHWLeSBz4ro1xeS-_xGSolK4BvsnsL-Vks5uU09dgaJpZM4WBC7i

jakubklimek commented 5 years ago

@dr-shorthair Yes, I am always for a stricter version. It is actually the use of dct:type with skos:Concepts which has the (completely unintentional) side effects.

@kcoyle Thanks for the heads up. Nevertheless, since the loosening of dcterms rdfs:ranges is still in the discussion phase, it can go either way. I personally would be against any loosening, which only allows more mess to be created and makes any reasonable application on top of such data complicated (too many options).

Regarding TopBraid behaviour, this only emphasizes what can happen when theses side effects are ignored. My exploration query use case is another one.

I strongly believe that inference is not something we can ignore because "it can be turned off if it causes problems". It needs to be taken into account as a natural consequence of using RDF and RDFS. It is quite simple really. We need to start from a use case. The use case is that we want to classify datasets with skos:Concepts. That is fine, but the property dcterms:type is not a good fit for this, because it has a range of rdfs:Class, which would make all used concepts classes, which is something unintended. Therefore, we need another property without such side effects.

@rob-metalinkage I think that placing exploratory queries on unknown endpoints is perfectly legal, and it is in fact the only way of determining that is stored inside - leveraging vocabulary reuse and inferencing (more on that topic in our Survey of Tools for Linked Data Consumption btw). I admit I am a bit lost in your extensive argumentation above, but the situation often is that you have a URL of some foreign SPARQL endpoint and you want to see (automatically) at least something about what is inside - the contract therefore is "here is a SPARQL endpoint, do your best with SPARQL".

makxdekkers commented 5 years ago

@jakubklimek Is there a use case that says we want to classify datasets with skos:Concept? Such a use case is not in https://w3c.github.io/dxwg/ucr/.
In the current draft of the new specification at https://w3c.github.io/dxwg/dcat/#Property:resource_type, there is no mention of skos:Concept as range of dct:type. It states correctly that the range of dct:type is rdfs:Class. So far, so good. Maybe the problem is that there is mention of MARC intellectual resource types which are indeed expliclitly defined as skos:Concept. Could the solution be to remove that example? Of the other four examples, the terms of the DCMI Type vocabulary are defined as rdf:Class. I don't know whether the other three (ISO19115, DataCite, re3data) have been published as RDF at all. The links link to text or XML enumeration.

jakubklimek commented 5 years ago

@makxdekkers I was referring to the initial issue by @lvdbrink, i did not investigate further. If there is no such requirement, maybe it indeed can be resolved by removing the MARC examples, but then another issue arises an that is how to classify datasets using MARC intellectual resource types, and we are back at specifying a new property for that, unless we say it is out of scope.

kcoyle commented 5 years ago

@jakubklimek We have heard the argument that you make about loosening, but the fact is that the inclusion of precise ranges does not constrain the use of the terms. Our preference is that constraint take place in application profiles rather than in the definition of the terms, since usage patterns show that there are often different range needs. But I must say that both arguments make sense, and some years in the future, after linked data has matured, we will find out which one we should have chosen.

kcoyle commented 5 years ago

@jakubklimek @makxdekkers I see no reason why the MARC types cannot be dct:type(s). skos:Concept is an instance of owl:Class, so there would be no conflict. No?

jakubklimek commented 5 years ago

skos:Concept is an instance of owl:Class, so there would be no conflict.

@kcoyle actually that is exactly the issue. skos:Concept is an instance of owl:Class - it is a class of all concepts. Then, individual concepts are instances, not subclasses of skos:Concept. Specifically, they themselves are not owl:Classes, unless specified explicitly somewhere else, which is by design.

However, using the individual concepts with dcterms:type entails the concept used is an rdfs:Class - it can have instances. And that is the issue - mere usage of a concept with DCAT this way would unintentionally define that it can have instances, which might not be desirable, as described by at least two use cases here.

dr-shorthair commented 5 years ago

@kcoyle indeed, I already included MARC in the examples - see https://w3c.github.io/dxwg/dcat/#classifying-dataset-types

rob-metalinkage commented 5 years ago

The issue is that is legal to use of a skos:Concept where a range is rdfs:Class, and this then is an explicit case of OWL punning. (I believe this is "intended").

Whether there is another unstated contract that OWL-Full semantics may not be used as intended is a separate matter - i.e. is there a Use Case from which we may derive a requirement that OWL-DL semantics MUST be supported by use of DCAT?

I'm not stating that this unreasonable, just that we dont have evidence to force us to make such a constraint at this stage.

I also wonder whether this is a case where the best approach could be an explicit OWL-DL profile of DCAT, where OWL-DL reasoning can be assumed?

I also think validation or identification of OWL profile is probably an necessary infrastructure demand if we want to enforce anything - its not IMHO reasonable to make all stakeholders expert in these matters.

dr-shorthair commented 5 years ago

the best approach could be an explicit OWL-DL profile of DCAT, where OWL-DL reasoning can be assumed

Indeed.

jakubklimek commented 5 years ago

The issue is that is legal to use of a skos:Concept where a range is rdfs:Class, and this then is an explicit case of OWL punning. (I believe this is "intended").

Sure, it is legal to use anything with any RDFS range definition. Only the consequence is that the "anything" becomes an instance of the RDFS range. And that is what I am talking about here. The act of usage of an instance of a skos:Concept with something that has RDFS range of rdfs:Class makes that something and instance of rdfs:Class. That, in my opinion, is unintended. People will simply want to describe datasets with metadata. And, many people will not realize this (as they are not experts).

Whether or not one discovers such effect, i.e. performs inference, is another matter, but the problem is already in there regardless of that. So, my argument is not to cause such side effects just for the sake of reusing dct:type. If we are going to write examples where skos:Concepts are to be used, in a sense other than with dcat:theme, we should have another property for that. Then you do not have to assume anything more.

makxdekkers commented 5 years ago

@jakubklimek I am just wondering why you think it is 'unintended' that an instance of skos:Concept used to classify datasets would also be an instance of rdfs:Class. It's not a case of us saying that all instances of any skos:Concept are also instances of rdfs:Class, just the ones that are being used with dct:type. Could it be solved with a warning at https://w3c.github.io/dxwg/dcat/#Property:resource_type? As far as I am aware, I don't think there was an explicit intention at DCMI to exclude the use of skos:Concept as object for dct:type. We could ask the people over there, e.g. @tombaker?

jakubklimek commented 5 years ago

@makxdekkers

I am just wondering why you think it is 'unintended' that an instance of skos:Concept used to classify datasets would also be an instance of rdfs:Class

OK, let me try to explain it another way. Let's say I have my own skos:ConceptScheme for classifying datasets. There are skos:Concepts for, e.g. genres. According to SKOS, it is intentionally undefined whether those concepts are also rdfs:Classes or not, i.e. it is up to the publisher. So, since it is my skos:ConceptScheme, I decide I do not want them to be rdfs:Classes.

Now, I want to use those concepts to classify my datasets using DCAT. I find dct:type, since that is the property DCAT users will expect for this according to its description in DCAT, and I want them to be able to understand my data. But I still have no intention of my concepts to become rdfs:Classes. However, by using dct:type to link to them, I effectively made them rdfs:Classes thanks to dct:types rdfs:range definition.

My question here is: "Why do my concepts have to become rdfs:Classes just because I want to use them with DCAT recommended property for genre classification?" Classifying dataset with a genre should keep the genre intact, and not imply something about it that was not there before I used it.

The bottom line here I think is, if I wanted the genre concepts to be used as classes, I would have made them classes myself, explicitly, and then probably used them with rdf:type, not dct:type.

Could it be solved with a warning

Well, a warning is the least I would expect there. But the question is why there should be something that needs a usage warning? Do we need the property for classification to be dct:type so bad?

I don't think there was an explicit intention at DCMI to exclude the use of skos:Concept as object for dct:type

And I do not to say that it is/should be excluded to use skos:Concepts as objects for dct:type. I say that it unintentionally entails information about the used concepts that might not have been there before. If the concepts were also classes before usage, everything is fine. But if they were not classes before usage, they become ones just because they were used with DCAT and dct:type. That I think will not be the intention of users of DCAT, and often enough they will not be able to forsee the effects of this, nor should they.

kcoyle commented 5 years ago

I've finally gotten my head around this (sorry it took so long). The problem as I see it is not in dct nor in DCAT, but in the fact that some communities are still using controlled term lists rather than classes to "classify" types. These are generally carry-overs from older metadata practices that had no concept of "class". A loosening of dct:type could make a choice like this more "valid": dct:type http://purl.org/dc/dcmitype/Text ; dct:type http://id.loc.gov/vocabulary/marcgt/man ; dct:type http://registry.it.csiro.au/def/datacite/resourceType/Text ; dct:type http://registry.it.csiro.au/def/re3data/contentType/doc ;

but I don't think that is the main issue here. The question that I see, instead, is whether there is a negative to be found in declaring something like http://id.loc.gov/vocabulary/marcgt/man to be a class when it is used in that way. To me, it serves the same conceptual role as a class and re-casting it as a class is taking it in the direction in which it should go when used in RDF.

I do not think it would be better to have two properties -

And I don't see a way to have a single property that has a range of both classes and instances unless defined as an annotation property, which has no advantages whatsoever, AFAIK.

My other comment is that although terms lists like those at id.loc.gov already exist, unless one intends to use a significant number of the terms it may be best to define ones' own list of classes, with some links to related classes or terms from other environments. I don't know if there is an analysis of the types that are likely to be useful to DCAT, but my gut feeling is that few of the id.loc.gov content types[1], carrier types[2] or media types[3] will be appropriate, so adding these to the DCAT mix may cause more problems than they solve. These lists and others like them should be replaced by RDF-appropriate classes as their communities move to the use of RDF (although I would not place bets on that happening in my lifetime).

My vote would be: don't use lists from outdated metadata practices; do the right thing and create RDF-appropriate classes.

[1] http://id.loc.gov/vocabulary/contentTypes.html [2] http://id.loc.gov/vocabulary/carriers.html [3] http://id.loc.gov/vocabulary/mediaTypes.html

dr-shorthair commented 5 years ago

@kcoyle wrote -

the fact that some communities are still using controlled term lists rather than classes ... ... don't use lists from outdated metadata practices; ...

Karen - not sure which world you are living in, but it sounds like a very enlightened and privileged one ;-) Back in the one where I spend my days it is hard enough getting term-vocabularies published on the web at all, so SKOS is often all we can hope for. In fact, you can get a long way with SKOS++ and there is significant innovation in this space - look at QUDT for example where all the key classes are sub-classes of skos:Concept, so individual classifiers are all individual skos:Concepts. And look at the NERC Vocabulary Service which has about 40,000 skos:Concepts and is used widely in the earth and environmental sciences. Now I agree that not all of these would be used as high-level classifiers in the dct:type slot, but some of them would.

Overall, if we disallow the use of SKOS for classification vocabularies I believe we consign DCAT to oblivion.

I also think it is big mistake to propose minting our own sets of term-lists where respectable authorities have already published lists with a URI-per-term. The fact that they are sometimes not described according to perfect DL-conformant OWL is much less important than the fact that an important authority (like LoC) is providing an important service to the linked data community by carrying over their legacy of analysis into a more modern platform. And we don't take on an additional maintenance burden.

Don't let "perfect" be the enemy of "good-enough" - actually more like "really quite good given the level of organizational engagement and community acceptance".

makxdekkers commented 5 years ago

I agree with @dr-shorthair that DCAT will lose relevance if we are too strict. On the other hand, some of the machinery that we're using does care about strict rules; for example, using SHACL you can only validate that the object of a particular statement is an instance of a certain, expliclty defined class: the skos:Concept http://id.loc.gov/vocabulary/marcgt/man fails the test for rdfs:Class. A human observer may not object to it but SHACL definitely does. I've seen a work-around in SHACL to just test whether there is a URI, and not look further into it. So, you could stick any URI into the statement and SHACL would not be able to catch it. In a way, using rangeIncludes instead of rdfs:range makes the problem go away, but it would make the validation of objects with SHACL less clear (maybe impossible?).

kcoyle commented 5 years ago

@makxdekkers Since "rangeIncludes" is not an RDF standard concept, it wouldn't be treated as a rdfs:range by SHACL. rangeIncludes could be defined in a validation document to be whatever you want it to be. It would be just another locally defined property, which is what it is in schema.org.

But in any case, to use a less strict definition it seems that DCAT should define its own property because dct:type is already defined with a specific range. If you want to include as values skos:concepts, URIs for classes, and perhaps also literals, you'll need a property with no rdfs:range, AFAIK.

makxdekkers commented 5 years ago

@kcoyle We could create a new property dcat:datasettype or something similar, based on the arguments in this discussion. However, there is already existing practice: the EU DCAT-AP specifies the use of dct:type with skos:Concept and the EU GeoDCAT-AP uses dct:type with objects from http://inspire.ec.europa.eu/codelist/ResourceType and http://inspire.ec.europa.eu/codelist/SpatialDataServiceType, both of which are defined as skos:ConceptScheme. If this is wrong practice, those profiles will have to be revised. The question is how easy it will be to convince people that it is necessary -- given that it took us in this group two weeks to get our heads around the issue.

kcoyle commented 5 years ago

@makxdekkers sigh This is a perfect example of why minimum semantics on property definitions is better. It also further convinces me that application profiles are where ALL constraints should be defined. I would prefer that APs use AP-specific constraint terms and not RDF domains and ranges because an AP is defining constraints not axioms for inferencing. This is what schema.org does - it uses very little from RDF, and defines its own terms for literals (schema:Text), URLs (schema:URL) and integers (schema:Integer), as well as for domains and ranges.

Meanwhile, a lot of the use of DC terms does not adhere to the domains and ranges by which they are defined. The world may be ending, but not for that reason. ;-)

jakubklimek commented 5 years ago

If this is wrong practice, those profiles will have to be revised.

@makxdekkers I think those profiles will have to be revised after DCAT revision anyway.

@kcoyle Isn't the main goal of DCAT to increase interoperability? If there is a set of different APs, each with a different set of restrictions and specifics, those will not be interoperable. What is the point then?

I actually view what schema.org does as a bit evil. It creates mess in the data, making its processing so hard, only people with a lot of resources are able to process it. I think that having a semantically clean model is a must, and publishers have to be properly motivated to publish their data right. This means there have to be applications able to (easily) process the data (think e.g. simple catalog software). This in my opinion will not be possible if we allow things like properties without ranges, or properties allowing both resources and literals as values.

Anyway, we digressed a bit here. The original issue was with the dct:type property. This property is already established. If it was used in a wrong way in DCAT2014, and this was then "imported" to DCAT-AP, and from there to GeoDCAT-AP, I think now is the time to correct that by introducing the new property for this, with clearly established range (skos:Concept) - and I really do not think this is "too strict".

Btw. errors like this in DCAT2014 were already fixed in the revision (e.g. the literal used as a media type in DCAT2014 examples https://github.com/w3c/dxwg/issues/170, which some implementations started using) so this would not be the first case.

rob-metalinkage commented 5 years ago

It sounds like the nice solution would be if dct:type was relaxed to range rdfs:Resource

However - it is legal to have a skos:Concept as the target of dct:type - it just needs to be recognised that the intent is thus to treat these targets as rdfs:Classes too.

If the target is declared to be both a skos:Concept and a rdfs:Class already - then having two different predicates adds a complication - do you need to fill in both?

If we want to enforce an OWL-DL compatible profile of DCAT. whereby referenced resources are also resolved to return OWL-DL compatible resources, how do we specify this, enforce it and validate it? This is why I think there needs to be an explict Use Case for OWL-DL semantics to give us requirements - because the current examples are not "used in a wrong way" - they are just used in an OWL-Full way, and nothing at this stage says this is actually wrong AFAICT.

So, if we are to support an OWL-DL compatibility constraint, we need to establish the exact requirements and explore available solutions. A new predicate is a specific solution to a requirement we dont formally recognise (at this stage) IMHO.

If we have such a requirement, we then need to make a decision if this is a matter for DCAT core or for a profile of DCAT.

Perhaps writing a OWL-DL profile of DCAT - that enforces such constraints would be a good exercise anyway - then we can consider how much could be migrated to DCAT core, but right now we are guessing a little how it would be testable in practice, which is a requirement for W3C reccomendations.

makxdekkers commented 5 years ago

@jakubklimek It is true that existing profiles may have to be revised as a result of the revision of DCAT, but I was hoping to keep that to the absolutely necessary minimum. We always run the risk that implementers do not feel it is useful to convert data they already have. I agree that if they've done it 'wrong', the new DCAT should not endorse the practice -- e.g in this case of dct:type -- but there is no guarantee that people will have the resources to make the change.

makxdekkers commented 5 years ago

@jakubklimek You wrote "If there is a set of different APs, each with a different set of restrictions and specifics, those will not be interoperable. What is the point then?". People will always implement a standard in their own way, maybe because of their (mis)understanding or maybe because their situation is a little different. Documenting their assumptions and decisions in a profile makes it easier for people with a different set of assumptions and decisions to understand how to interoperate with them, maybe using mapping or conversion tools. There should be a sweet spot: keeping the standard flexible where possible and strict where necessary.

jakubklimek commented 5 years ago

@makxdekkers You are right in both points.

People will always implement a standard in their own way, maybe because of their (mis)understanding or maybe because their situation is a little different.

I like your optimism. In my experience, people usually implement standards only to the minimum degree where no important users/decision makers complain. And decision makers usually complain only when there is a web page with something about their data written in red (i.e. validation errors), making them look bad. So for me, it is important to be able to validate as much as possible automatically, because I know that what cannot be validated automatically, will not get fixed/published right.

Otherwise, I understand the motivation to find the sweet spot and to use application profiles.

tombaker commented 5 years ago

I'm coming into this discussion from the side, so please forgive me if I repeat points already made.

As @kcoyle may have mentioned, the DCMI Usage Board is actively reviewing DCMI Metadata Terms in the context of preparing ISO 15836 Part 2. We are in the process of loosening the semantics of object properties by replacing rdfs:range with a yet-to-be-coined DCMI version of the Schema.org property rangeIncludes. We still need to finalize details, but the Usage Board is in principle solidly behind this move. In so doing, we will revise the DCMI Namespace Policy to explicitly allow loosening of semantics. For the case discussed in this thread, the loosening would mean that the object of a statement using dct:type, such as a SKOS concept, would no longer be inferred to be (also) an rdfs:Class.

DCMI went perhaps a step too far when it originally assigned a formal range of rdfs:Class because rdf:type already had a range of rdfs:Class, and because it seems in retrospect unreasonable to expect that values used with dct:type must be inferrable to be RDF classes or explicitly declared in RDF to be classes. One finds, in the wild, SKOS concepts used as types (@makxdekkers points to an example), which to me seems quite reasonable.

As I think several people have noted in this thread, it is always possible to define stronger restrictions in an application profile, or to enforce a particular interpretation for a property with a validation scheme.

-- Tom Baker tom@tombaker.org

kcoyle commented 5 years ago

@jakubklimek The DCMI goal is to be able to use application profiles directly for validation - thus validation works on the metadata creator's set of rules, and presumably both the creator and the user are using the same rules. It seems to be that this would improve interoperability because the metadata definitions would be explicit and testable.

starred75 commented 5 years ago

@tombaker brings good news for the future of dcterms.

I think at the basis there is a huge problem with the collapsing of the local names of two very relevant properties. Oh, yes, they are different URIs but I would not use any locally named "type" property when there is such a cumbersome guy called "rdf:type" around. Certainly none is to be blamed: DC was born after RDF (but RDF's specs are younger than DC's), surely when they got married both of them already had their own "type" properties.

Going to logical aspects, I'm with @jakubklimek and @lvdbrink. Yes there's punning, skos:Concepts can be rdfs:Class or owl:Class (it's written in the skos specs as well) etc... ...yet having a property which is meant to point to rdfs:Class and then it is going towards generic resources - probably owl:Individuals if we were in OWL, and I read in some comment that not all of the suggested examples are skos concepts, yet I don't think we are expecting classes - is not gonna help keeping things clean. One thing is that, to the purpose of one ontology somehow merged/imported etc.. with a thesaurus, you want to say that a certain skos:Concept is also a class, one thing is deliberately using a property in the (IMHO) wrong way. Tom is right, probably DCMI went too far at that time.

Additionally, if we look just from a terminologica point of view, is it really a type? from the suggested target datasets and their data, I see many things that could be topics, genres, maybe "type" would not be what I had in mind.. (yet others could be)

I'd be in favor of using another property or, simply, dc:type (not dct). @makxdekkers made a point on the existence of existing profiles using dct:type, but at least it was not in the specs of the original DCAT and thus the AP maintainers could in theory continue to use their property and their semantics (possibly redundantly with the new one here if adopting the new DCAT in old data) or, if updating the AP specs to the new DCAT, change the property.

P.S I disagree on the local standards and AP with embedded semantics. This may and will always happen, but at least in principle they should try to be understandable as widely as the World Wide Web. APs are useful to put together different ontologies, discard some unused properties and tell users which ones to use, concretely suggesting potential target datasets for some properties (as in this case) etc... they should not redefine semantics. I prefer a world in which if I pull from anywhere a triple such as: :Mario foaf:knows :Luigi I'm totally sure that :Luigi is a foaf:Person and not the rabbit pet of :Mario In my experience, all soft spots in available standards, all those "do it as you wish" create more issues than solutions. But better I stop this "P.S.", I'm on the boundary of being OT ;-)

dr-shorthair commented 4 years ago

I take most solace from @tombaker comment that

DCMI went perhaps a step too far when it originally assigned a formal range of rdfs:Class [for dct:type] because rdf:type already had a range of rdfs:Class

I had always seen dct:type as a way of adding additional classifiers that were not rdfs:Class since there was already a home for those (i.e. rdf:type!) dct:type should not duplicate the functionality of rdf:type. It appears this has been fixed in the most recent version of DC - https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/type/ has no range restriction any more.

So I think we can close this issue as it is no longer valid.

dr-shorthair commented 4 years ago

However, I just realised that in a strict DCAT context, the rdf:type would always be dcat:Dataset or dcat:DataService or another sub-class of dcat:Resource, being the description of the dataset (etc) in the catalog. OTOH the dct:type is used to classify the dataset itself (etc), i.e. not the context resource, but the thing that the context resource describes.

andrea-perego commented 3 years ago

There has not been further discussion on this issue.

I propose to close it.

aisaac commented 3 years ago

+1 for closing. Especially I would add that the starting point of the issue doesn't apply anymore. As hinted in earlier discussions, now DCMI doesn't mandate a formal range of rdfs:Class for dcterms:type: https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/type

andrea-perego commented 3 years ago

Thanks, @aisaac . Closing.