w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
153 stars 47 forks source link

FPWD comment: use dcat:Distribution for Resource #529

Closed smrgeoinfo closed 5 years ago

smrgeoinfo commented 6 years ago

In the Figure 1. OWL [owl2-overview] diagram of this ontology, the ResourceDescriptor class is basically an associationClass that associates a resource with the Profile in some role with some purpose. There might be multiple representations of this resource available, and these will have different linking information (URL, Mime Type etc.). The dcat:Distribution class is defined to model exactly this; shouldn't the target of the 'artifact' predicate/role be dcat:Distribution? The model would look something like this: image

smrgeoinfo commented 5 years ago

see related discussion #638. From a general point of view, a profile is just another kind of related resource, and profile could be modeled as a role for the related resource.

nicholascar commented 5 years ago

I diagree that parts of a Profile, currently modelled as artifacts and associated with a Profile via a Resource Descriptor can be modelled as dcat:Distributions of the Profile for the following reasons:

  1. such a modelling would indicate that the Profile is a dcat:Dataset and a Dataset and a Standard (dct:Standard) are not the same thing
  2. From DCAT (rev)'s definitions: Distributions are "A specific representation of a dataset", so this is representation of not representation of part of. With Resource Descriptor, we are interested in part of representation for thinkgs such as validation constraints, documentation or a specification document which only together, with all other Profile parts, make the whole profile.

For these reasons, I recommend not using dcat:Distribution for prof:ResourceDescriptor.

andrea-perego commented 5 years ago

@nicholascar , I'm not questioning here having prof:ResourceDescriptor, but I disagree about your point (1).

According to the (quite broad) DCAT definition of "dataset", a standard/specification can be considered as a dataset (although it is also other things). This position was made explicit when ADMS was contributed to GLD (the WG in charge of DCAT), who made adms:Asset a subclass of dcat:Dataset (which was not the case in the original ADMS version).

I also don't see the difference between "representation of" and "representation of part of" when talking about distribution, as there is no requirement for dcat:Distribution's preventing them from representing just parts of a dataset.

Validation constraints, etc. more than "parts" of the representation of a profile, concern the "role" of a resource descriptor, and the same resource descriptor can play more than one role.

kcoyle commented 5 years ago

I'd rather think of the files/resources of a prof:Profile as being "members" (as in members of a set) rather than "parts". As was said in an earlier issue, with profiles as defined in the ontology there is no "whole" - nothing indicates that a profile is defined as having specific parts, nor is there any measure of completeness. But mostly there is no whole to be a part of, and there is nothing defining relationships between the members; some of may be entirely stand-alone and un-related to other members.

To my thinking the primary way in which profiles differ from DCAT datasets is that there is no descriptive metadata for the profile: no title, no creator, no date, no topic, no standards used, etc, nor for the resource. Thus a profile is not a dataset in the DCAT sense of that term (or in any other sense that I know), only a class (IRI) with a minimum human-semantic component and very little in terms of RDF semantics (domains, ranges, sub-classes). I would also say that the prof:Profile is not analogous to a dcat:Catalog ("A curated collection of metadata about datasets and data services") - although it is possibly a "curated collection" it is NOT a collection of metadata, but at best a collection of files or resources. Thus a set of resources that are brought together by someone as belonging to a particular profile. It's all pretty vague, whereas DCAT catalogs and datasets are much more specific.

andrea-perego commented 5 years ago

@kcoyle said:

To my thinking the primary way in which profiles differ from DCAT datasets is that there is no descriptive metadata for the profile: no title, no creator, no date, no topic, no standards used, etc, nor for the resource.

It is true that this information is missing in PROF but, to my understanding, the reason is that this is a programmatic decision related to the scope of PROF, which focusses on describing the profile context, also in view of profile-based conneg.

But I think the issue here relates also to vocabularies which are overlapping with and complementary to PROF. However we define a profile in PROF, in ADMS a profile will be an adms:Asset and a resource descriptor will be an adms:AssetDistribution (which are subclasses of dcat:Dataset and dcat:Distribution, respectively). And a profile which is an RDF vocabulary or an OWL ontology will be a voaf:Vocabulary in VOAF.

People have been using both VOAF and ADMS for describing profiles for some time now (for discovery and statistical analysis), so I think it is important we clarify inside the WG which is the real difference (if any) between what we mean to model in PROF with respect to profiles, and DCAT / ADMS / VOAF.

I am personally still a bit confused, and I still find hard to see how a profile cannot be also a dcat:Dataset and a resource descriptor a dcat:Distribution. They are of course also other things, and PROF is meant to describe exactly these aspects, in my understanding.

nicholascar commented 5 years ago

@andrea-perego ok, reading your comment and backpedelling on my comment above as a result, I think @smrgeoinfo's original suggestion (image at top) can fold ResourceDescriptor into dcat:Distribution if prof:hasRole is catered for perhaps leading to this:

alignment-adms

Q @andrea-perego: what do you think of the rdfs:subClassOf between PROF & ADMS? Have I got the subclassing directions the right way around or do you think PROF is more general than ADMS, and perhaps even DCAT?

andrea-perego commented 5 years ago

Thanks, @nicholascar . The diagram is basically matching the ADMS/PROF alignment in https://github.com/w3c/dxwg/wiki/PROF-Alignments-and-crosswalks

There are a couple of things to be fixed, I think.

  1. A typo: adms:AssetDistribution is said to be a subclass of dcat:Dataset, whereas it is dcat:Distribution

  2. Not sure about the subclass relationship with what in the diagram is called "VOAF Class". The only VOAF class which could be related to the notion of profile is voaf:Vocabulary, but I'm not sure subclassing can model this relationship, as not all voaf:Vocabularys are prof:Profiles

I have also been working some more on the comparison between PROF / ADMS / VOAF, and the current results are here (view only link):

https://docs.google.com/spreadsheets/d/11DQK4wBEis1Ev2rX_U0viQylMgrVPHszVxhvb2bf3s0/edit?usp=sharing

I have not imported it yet in the wiki, as VOAF has some properties that look like they are overlapping with those defined in PROF - they were discussed some time ago in https://github.com/w3c/dxwg/issues/216

Also in this case, I think it is important we clarify the difference between those properties in PROF and VOAF which are semantically similar.

nicholascar commented 5 years ago

Hi @andrea-perego, thanks for the error checking and confirmations. I've updated the diagram to remove the two lazy errors I made that you pointed out - the prof:ResourceDescriptor rdfs:subClassOf dcat:Distribution and the 'VOAF Class' indeed being a voaf:Vocabulary. I've put in a '?' now for the prof:profile/voaf:Vocabulary relationship for now.

andrea-perego commented 5 years ago

Thanks, @nicholascar .

I saw that you changed the title of the wiki page with the alignments: please note that this has broken all the links to it (I updated the one in my previous comment).

andrea-perego commented 5 years ago

I integrated in the wiki page also the alignments with VOAF:

https://github.com/w3c/dxwg/wiki/PROF-Alignments-and-crosswalks

Please note that there are a number of VOAF properties which I am unsure how the relate to PROF (I put a question mark there).

Please review.

nicholascar commented 5 years ago

Have reviewed and made some minor changes. prof:hasToken I was keen to see be replaced with adms:identifier but Rob disagrees in order to prevent a whole owl:imports of ADMS but is happy for prof:hasToken rdfs:subPropertyOf adms:identifier.

Perhaps we should think a bit more carefully about terming PROF an extension to ADMS with prof:isProfileOf and prof:hasRole as the two standout properties.

Also, we should recommend that if someone wants general properties like titles, descriptions, Agent relations, one should just use ADMS or DCAT properties with the alignments here in mind. Prevents us having to describe dct:title etc. use.

kcoyle commented 5 years ago

"Also, we should recommend that if someone wants general properties like titles, descriptions, Agent relations, one should just use ADMS or DCAT properties with the alignments here in mind. Prevents us having to describe dct:title etc. use."

Anyone can say anything about anything so I don't think it is necessary to say that people can use whatever vocabularies they want to further describe the classes and properties in PROF - it's an RDF vocabulary and therefore by definition they can. I think the better question to address why PROF is what it is, and to clearly define PROF as providing a kind of macro context over profile resources. Then people can decide if that's a need that they have.

makxdekkers commented 5 years ago

@kcoyle It is true that "anyone can say anything about anything" but it could still be useful to point people to good practice. And DCAT is mentioned as best practice in https://www.w3.org/TR/dwbp/.

kcoyle commented 5 years ago

@makxdekkers I agree that people should be pointed to best practices, but what we are talking about here is the profiles ontology, which is what it is. If the creators of that ontology feel that a "best practice" is to include additional properties then they need to include those either 1) in the ontology itself (just as DCAT does with dct etc.) or 2) they need some additional documentation that states what they would consider to be a "best practice" for users of the ontology. At the moment all that we have is the ontology described in the profiles working draft, so really that's all we can assume as the sum total of the recommendation. Do we consider it complete "as is"?

andrea-perego commented 5 years ago

@nicholascar wrote:

Have reviewed and made some minor changes. prof:hasToken I was keen to see be replaced with adms:identifier but Rob disagrees in order to prevent a whole owl:imports of ADMS but is happy for prof:hasToken rdfs:subPropertyOf adms:identifier.

Not sure why it would be necessary to import ADMS. Note, however, that the range of adms:identifier is not a literal, as it is for prof:hasToken.

I would also like to add that there's a discussion under-way in https://github.com/w3c/dxwg/issues/453 about prof:token / prof:hasToken, questioning whether it should be or not included in PROF.

kcoyle commented 5 years ago

"Not sure why it would be necessary to import ADMS." I had the same thought, and silently wondered if this isn't a tool issue (I have a great deal of frustration with the state of tools for ontology creation). In any case, we shouldn't make decisions based on the current state of tools but on our requirements.

nicholascar commented 5 years ago

@andrea-perego regarding adms:idintifier's range value: yes, I'm aware of that but I think that's appropriate. I was thinking we could use an adms:Identifier with identifier scheme set to something indicating its use, so the scheme would be, as we have driving requirement for a token, an API.

smrgeoinfo commented 5 years ago

re. https://github.com/w3c/dxwg/issues/529#issuecomment-457991188 if a Profile 'hasResource' 'ResourceDescriptor', I would expect 'ResourceDescriptor' to be a resource, in which case wouldn't it be a kind of Asset that has an assigned role relative to a Profile. Assets have distributions, thus a ResourceDescriptor has Distributions. That's the logic I was proposing.

andrea-perego commented 5 years ago

@smrgeoinfo , I see your point, but in ADMS prof:ResourceDescriptor wouldn't be an adms:Asset.

If, as you explained, prof:ResourceDescriptor is meant to be an association class, it would be something not included in ADMS. I think this is also the approach implemented by @dr-shorthair following up from the discussion in https://github.com/w3c/dxwg/issues/638

What I'm unsure of is whether this additional layer of complexity is actually needed, instead of defining roles as (binary) relationships (as I commented in https://github.com/w3c/dxwg/issues/638#issuecomment-453664641).

smrgeoinfo commented 5 years ago

I would agree with @aisaac response to the #638 (comment).

aisaac commented 5 years ago

@andrea-perego @nicholascar I second @smrgeoinfo 's reminder that his proposal for aligning to DCAT (or at least for having an analogy with it) is that it's not the prof:ResourceDescriptor level in the current PROF that corresponds to dcat:Distribution; it's rather the level of "artifacts". So if being an adms:Asset implies being a dcat:Distribution, then as Andrea says prof:ResourceDescriptor wouldn't be an adms:Asset. Which is not a big worry for me...

Maybe one good litmus test is the question: where does dct:format fit in the landscape? Currently PROF has it on ResourceDescriptor but I think it's not appropriate. Many expressions of profiles (such as OWL or SHACL) can exist in different syntaxes (such as RDF/XML or Turtle), thus in different media types, and thus in different dct:formats. To me therefore dct:format fits better at the level of PROF's artifacts. If (as I believe) dct:format is rather a dcat:Distribution property in DCAT, then it means that the analogy between PROF's artifacts and dcat:Distribution is stronger than the one between prof:ResourceDescriptor and dcat:Distribution.

dr-shorthair commented 5 years ago

Maybe one good litmus test is the question: where does dct:format fit in the landscape?

Complemented by dct:conformsTo which allows a link to the schematic form, independent of the serialization.

aisaac commented 5 years ago

@dr-shorthair maybe I need to rephrase: where does dct:format fit in the landscape? Should it be used on the PROF ResourceDescriptor or PROF's artifact?

andrea-perego commented 5 years ago

@aisaac , I share your interpretation of @smrgeoinfo 's proposal.

Making a comparison with ADMS/DCAT, prof:ResouceDescriptor is an intermediate entity (not present in ADMS and DCAT) linking a prof:Profile (which would be an adms:Asset in ADMS) to one or more artifacts (which would be an adms:AssetDistribution in ADMS). Consequently, all what is about format, data schema, etc. should be an attribute of the "artifact" (as distribution), not of the prof:ResourceDescriptor.

My concern is that I'm not yet seeing a strong case why we need an intermediate, abstract entity, instead of making prof:ResourceDescription a distribution, and the property linking to the artifact something similar to dcat:accessURL.

smrgeoinfo commented 5 years ago

As I understand, the logic for prof:ResourceDescriptor is to implement a qualified association, so there is a role attribute on the link to the representations (prof:artefact, dcat:Distribution)

andrea-perego commented 5 years ago

@smrgeoinfo said:

As I understand, the logic for prof:ResourceDescriptor is to implement a qualified association, so there is a role attribute on the link to the representations (prof:artefact, dcat:Distribution)

Agreed. But if this is the only requirement, cannot this be done by specifying the role as a binary relationship instead?

smrgeoinfo commented 5 years ago

@andrea-perego are you suggesting a pattern like Specification --> roleRelation --> Resource --> representation --> Distribution, where the roleRelation would be 'hard-typed' i.e. named and defined in the spec? Is there an existing property (besides dct:relation) for linking dcat:Resource to dcat:Distribution?

andrea-perego commented 5 years ago

No - sorry for not having been clear.

The alternative option I see is to consider prof:ResourceDescriptor a "distribution" and not an association class - namely:

Profile - role relation -> Resource Descriptor - has artifact -> URI ref

smrgeoinfo commented 5 years ago

Thanks for the clarification. So the role relations might be subtyped from 'hasResource'. I see that could simplify things, but would (I think...) lose the ability to express multiple roles for a single resource descriptor, and to have different formats of the same ResourceDescriptor (they'd all be different ResourceDescriptor instances).
Personally, I prefer the option of making the role a property of a relationship class, allowing a prof-defined role vocabulary for interoperability, and a separate optional property for additional role specifiers from other vocabularies for greater semantic precision, and the relation target a resource with 1..* Distributions. There are various implementations that will work, all with pro's and con's and selecting one is almost a matter of taste; I'm not wedded strongly to either solution.

rob-metalinkage commented 5 years ago

This is actually being discussed here: https://github.com/w3c/dxwg/issues/747

a dcat:Distribution is not an artefact - its a metadata class describing an artefact, and performs the exact same role - its a qualified association between the thing and the artefact. It is "aligned" with dcat:Distribution in a separate file - so the debate (IMHO) should be whether the profiles ontology should be normatively dependent on DCAT. It is constrained by design to be consistent with the DCAT Dataset/Distribution pattern however, as the assumption is people will catalog profiles and arftefacts...

kcoyle commented 5 years ago

@rob-metalinkage This is the first that I have seen that PROF is "constrained by design" to be normatively dependent on DCAT. That was not included either as a requirement nor in the documentation. I think this needs to be a question before it is an answer. If PROF is specific to DCAT then it should be part of the DCAT "family" and should not be presented as being independent (which so far it has been). That would probably greatly change much of the discussion.

andrea-perego commented 5 years ago

@smrgeoinfo said:

Thanks for the clarification. So the role relations might be subtyped from 'hasResource'. I see that could simplify things, but would (I think...) lose the ability to express multiple roles for a single resource descriptor, and to have different formats of the same ResourceDescriptor (they'd all be different ResourceDescriptor instances).

Actually, I don't think that in the scenario I was describing role relationships should be subproperties of prof:hasResource, as they express a different type of relationship. So, the idea is that the profile would be linked to a resource descriptor with prof:hasResource plus one of more role relationships. E.g.:

a:Profile a prof:Profile ;
  prof:hasResource a:ResourceDescriptor .

a:ResourceDescriptor a prof:ResourceDescriptor ;
  role:x a:Profile ;
  role:y a:Profile ;
  ... ;
.

or (in case the domain of role relationships should instead be prof:Profile)

a:Profile a prof:Profile ;
  prof:hasResource a:ResourceDescriptor ;
  role:x a:ResourceDescriptor ;
  role:y a:ResourceDescriptor ;
  ... ;
.

a:ResourceDescriptor a prof:ResourceDescriptor ;
.
kcoyle commented 5 years ago

I haven't yet seen a "live" PROF example of an artifact that plays more than one role. (I think this will be fairly common.) The artifact does not change when playing multiple roles. If the ResourceDescriptor describes the artifact - by analogy to a library catalog record describing a book - then you want one RD and multiple roles. This makes me reluctant to have the role in the RD graph because it in essence changes the nature of description. I think you want a "pure" RD that only describes the artifact and that can be linked to one or more profiles with roles. Exactly what code you need for that I will leave to others.

For a real life example, the BIBFRA.ME profiles all use the same guidance document(s), and the guidance documents themselves are used to support a fairly large number of profiles. So we should assume a many-to-many situation, and our descriptions of resources should be function-neutral because we can't determine how someone will choose to use them in the future.

nicholascar commented 5 years ago

Closing as the primary issue here is dealt with as part of the DCAT alignment in the document.

If other issues raised here till need addressing, separate issues should be raised.

aisaac commented 5 years ago

@nicholascar before closing we need to record what the decision was wrt the original matter: in the end is prof:Resource aligned with dcat:Distribution?

nicholascar commented 5 years ago

prof:ResourceDescriptor is conceptually aligned with dcat:Distribution, as per the PROF/DCAT Alignment in the ED.

Closing after listing in plenary 2019-09-03 + 3-day wait period.

kcoyle commented 5 years ago

Again, issues raised by @smrgeoinfo @andrea-perego @makxdekkers @aisaac need to be resolved, and at least those folks need to be pinged.

aisaac commented 5 years ago

I was essentially waiting for a record of any decision, I think @nicholascar fulfills that.

An additional note: as of today, I find myself unable to weigh all possible consequences of the decision, so I may come with some remarks later. But considering the age of the issue, and the fact that questioning the decision made here would require a look at a PROF landscape that has changed a bit, I can argue with confidence that these remarks should be raised a new issue, and that it is really preferable to close the issue.

kcoyle commented 5 years ago

@aisaac Thanks. Do you have an idea of what would be the new issue (since this one has taken a rather crooked path)?

There is an open issue about a "formalism" between DCAT and PROF (#808), but I think this is more fundamental. We've had lengthy discussions about this, including #769, which questions the overall design, of which the "DCAT Distribution" shape would be one possible one. There is also #638.

From what I can tell, we don't have a clear path to agreement here. We should probably return to this only after reviewing the current draft because it seems to me that a first assessment should be whether this is still an issue, or if the current draft resolves it. We should also consider whether we want to address this at all, or if this could be a "decided not to address" issue (assumng commenters here are in agreement with that).

aisaac commented 5 years ago

@kcoyle yes we should probably return to this only after reviewing the current draft. My comment was confusing: I said "I may come with some remarks". The "may" applies to everything that follows, too: i.e. I could see an issue, but I could also well not see an issue! (in fact I think I'm alright with the current mapping to dcat:Distribution, it's just that I've not re-examined all the context).

nicholascar commented 5 years ago

Closing after WG notification period of due-for-closing and proponent's comment above that any additional remarks can be made on top of the 3PWD, which will necessitate a new Issue.

aisaac commented 5 years ago

For the record the remarks I was envisioning have now been made in #573. Discussion continues there, I guess.