w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
153 stars 47 forks source link

Profiles may be written in or may link to a document or schema in a validation language (ShEx, SHACL, XMLschema). [ID41] (5.41) #279

Open nicholascar opened 6 years ago

nicholascar commented 6 years ago

Entered from Google Doc

agreiner commented 6 years ago

What does it mean to "link to" something in this context? In general, the web allows one to link anything on the web to anything else on the web.

kcoyle commented 6 years ago

I think the idea was that the validation rules, in a validation language like shacl or shex or xsd, could be a separate file, but that the profile would provide a URL or (if in RDF) a property with the identifier of the validation file as its value. If the profile is itself expressed as some form of code (k/v pairs, xml, rdf, etc.) then my preference would be that there be an explicit metadata element for the validation document, not just something like "dct:related". You'd want to be able to know precisely if the profile has an actionable validation file somewhere.

Where I see a hitch is that there are validation descriptions, like xsd, shacl, and shex, which are files that define validation but need software to run them, and then there are actual validation programs that someone may provide. In my community there are a lot of home-grown validation programs for metadata. I don't know if people are sharing those, but they are of a different nature from the validation files that are input to to a program like schematron or topbraid.

rob-metalinkage commented 6 years ago

Yep. not a hitch however in profileDesc which models these similar to Distributions for Datasets - without constraing the nature of the distribution - hence we can have custom programs, web pages, instructions, guidance etc. The link is via another object, and can thus be qualified with a any metadata we need about form, role, audience etc. We have work to do to optimise the model (standardise this metadata) I suspect, but the basic separation and mechanism is there.

We dont have an alternative (pre-existing) canonical option for a qualified link or specialised links for all possible types of implementation resources for profile definition or validation.

makxdekkers commented 6 years ago

I do not understand why we can't just use the existing DCAT model of Dataset (for the profile) and Distribution (for the machine-readable file that expresses the profile). Also in the case of a Distribution of a Dataset, you need software to do something with a file -- all we do is provice the format or media type, and let the client figure out what to do with it. What's the difference with the handling of a SHACL file?

kcoyle commented 6 years ago

Makx, I'm not sure that you know that a shacl document is a shacl document based on the media type - it would have a media type of .ttl or .rdfxml. (This is from memory of discussions of a few years ago, but that is what I recall.) Does that matter? It seems that you would have to read the distribution to determine what type of distribution it is, and if it can be used to validate the data. My impression is that there was a desire to favor validation as a function, making it explicit.

makxdekkers commented 6 years ago

@kcoyle Thanks, I do now see that there is a difference between the media type and the 'meaning' of the file. This is actually why in ADMS there are two properties, one for the media type of the file and one for the 'representation technique' https://www.w3.org/TR/vocab-adms/#adms-representationtechnique. Maybe such an approach could be adopted for profiles too. In a way, ADMS might be a pretty good starting point for profile descriptions. After all, it was specifically designed to describe these kinds of things and it's based on DCAT. However, I do not think that the fact you need software to process the file is something we need to cover in DCAT.

kcoyle commented 6 years ago

Thanks, @makxdekkers for the reminder about ADMS. If I recall correctly, Andrea was suggesting ADMS for profile description some months ago. I'll try to find a place to add it into that discussion. That would fit with treating profiles as datasets, I believe.

dr-shorthair commented 6 years ago

ADMS for profile description

An adms:Asset is a special kind (i.e. sub-class) of dcat:Dataset, which primarily has some additional properties reflecting the fact that an Asset is better managed than a generic Dataset. It also adds some relationships to other Assets. The latter are certainly related to the requirements around dependencies that are typical in descriptions of Profiles. But overall adms:Asset is a more general case, so would need further specialization to support description of Profiles.

Profile is logically a sub-class of dct:Standard (see https://w3c.github.io/dxwg/profiledesc/) and potentially also of adms:Asset (which would make it also a sub-class of dcat:Dataset). But whether this makes our life easier (i.e. whether it will be easier to explain to implementers) to rely on deeper subsumption, instead of defining a Profile class with fewer dependencies is questionable IMHO.

dr-shorthair commented 6 years ago

(I also recall @makxdekkers and @philarcher warning us against making strong dependencies on ADMS - we should certainly learn from it, but maybe not build it into a potentially brittle dependency chain? - also see #111 )

nicholascar commented 6 years ago

@dr-shorthair why "...hunch is that Profile is logically a sub-class of dct:Standard when it's explicitly stated as such in https://w3c.github.io/dxwg/profiledesc/ and, I think, this definition has been viewed by a few members of the group and not found to be incorrect?

dr-shorthair commented 6 years ago

Apologies - the hunch was more about the second part of the sentence, but the hedge was superfluous given the rest of the sentence. I've edited the comment above to match.

dr-shorthair commented 6 years ago

@rob-metalinkage responded by email (but it didn't make it through to here) -

There are two reasons we dont just reuse dcat:Dataset

  1. We need to model more aspects of profiles including their relationships to each other and different possible roles of resources.

  2. We decided profile descriptions are a broader concern than cataloguing them.

Thus the alignment between Profiles and dcat:Resources is what we are concerned about here and perhaps more explicitly whether subclasses of Resource are disjoint. I'm agnostic whether in the alignment Profile is a subclass of Resource or Dataset... but for backwards compatability could we add a note and example stating that a Profile MAY be considered as a dataset of constraint objects?

dr-shorthair commented 6 years ago

whether subclasses of Resource are disjoint

I believe that OWL semantics would say that sub-classes are not disjoint unless it is specifically axiomatized. Here's the kind of thing I have in mind though -

dcat:Resource sub-classes

ext:Sample and ext:Software are examples of other things that we might want to catalog.

Depending on how one feels about 'hijacking' you may be more or less unhappy about the sub-class relationship between dct:Standard and dcat:Resource. Personally I don't see a problem - if you want to catalog a standard then it is probably both ...

kcoyle commented 6 years ago

First, thanks for the diagrams! I always find pictures helpful.

I don't think you need the sub-classing of dct:standard to dcat:Resource. The profile can be a sub-class of both. So the arrow from dct:Standard could instead be moved to Profile, and it then is a subclass of both dct:Standard and dcat:Resource, and no statements need to be made to subclass dct:Standard to any DCAT classes. This makes it, semantically, a "standard resource", which I think is a great way to define it.

dr-shorthair commented 6 years ago

OK - fair enough. A dct:Standard is not necessarily a dcat:Resource - at least it is not until it has been added to a dcat:Catalog somewhere ☺ Of course the same applies to a profdesc:Profile. So strictly I think we should have something like this

Cataloguing standards UML

This separates the DCAT concerns (which are about cataloguing) and the Profile concerns (which are about standards).

Of course it is not necessary to axiomatize the dcat:Catalogued* classes directly, just give the catalogued resources both types e.g.

my:StandardABC 
    rdf:type dcat:Resource ; 
    rdf:type dct:Standard . 

Thinking about how this all relates to ADMS you might go to Cataloguing assets UML Not sure how much of this we would want to follow through with.

nicholascar commented 6 years ago

Back to Makx's question "I do not understand why we can't just use the existing DCAT model of Dataset (for the profile) and Distribution":

I've just set up a profileDesc description of a dummy DC Application Profile for testing: CSIRO ePublish Dublin Core Application Profile.

I've modelled the thing overall as a Dataset (given not by any RDF properties but just by the use of a URI with /dataset/ in it) and it's not great fun. I have the various profileDesc Implementation Resource Descriptors (the Guidance and FullConstraint objects in RDF, PDF, etc.) serving useful functions (allowing for multiple constraint representation and descriptive docs about the profile) but I don't see how any of this is easily mappable to a DCAT-like things such as Dataset & Distribution in any useful way.

I can see how there may be upper, abstract mappings possible but so what? Do we really need profiling artifacts to be slaved to even abstract versions of DCAT? Sure, one can abstract right up to owl:Thing and find mappings but, again, so what?

Can we perhaps concentrate on representing existing practice of profiling, with a nod to future practice, as profiling, not cataloguing, before we really pound the profiling/cataloguing crosswalks further? Else we might be hampering profile representation due to DCAT's embedded ways of operating.

nicholascar commented 6 years ago

Also see my answer https://github.com/w3c/dxwg/issues/317#issuecomment-418015589 for things that ADMS can't do for profiling.

nicholascar commented 5 years ago

This requirement is clearly met by the 2PWD of PROF

aisaac commented 5 years ago

@nicholascar do you remember why you've removed the profile-guidance label from this issue? I think it may be worth keeping the attachment...

And for the record I think it would be good to point to the places of 2PWD that meet the requirement.

nicholascar commented 5 years ago

@aisaac I don't remember why Guidance was removed but I think this Issues should really be for guidance, not PROF: PROF allows Constraint Language Resources and has multiple examples of them in 2PWD, e.g. Example 3. Switching labels.

aisaac commented 5 years ago

@nicholascar thanks I think the reference you give should be enough, but that for the sake of precision if you mention 2PWD you can use the real link, which in that case should be (I believe) https://www.w3.org/TR/2019/WD-dx-prof-20190402/#example-3-property-isinheritedfrom-in-use . Who knows, later on maybe the general URI in dx-prof will contain a document with different HTML anchors!

And ok if you don't remember why you've removed the Guidance label and agree with keeping it, we're good on that front! I am however just removing the 'due-for-closing' label as I'm not certain it should have this status wrt Guidance.

nicholascar commented 5 years ago

OK, thanks @aisaac for the reminder to use the more persistent link!

Closing after listing in plenary 2019-09-03 + 3-day wait period.

kcoyle commented 5 years ago

@aisaac are you ok with closing this? You removed the label earlier. Also, there is quite a bit of discussion above - has it been captured in the document? @nicholascar remember to give the resolution of the issue, not just that it was listed in the plenary. That isn't the key bit of info; we need to know what happened with the ideas that came up here.

aisaac commented 5 years ago

@kcoyle indeed I'm not ok closing this, well spotted. As said earlier I don't know if it has been handled for the Profile Guidance document, therefore we need to keep it open from the perspective of this deliverable. @nicholascar it should have disappeared from your list of issues to close, as it's not labeled for PROF anymore!