w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
146 stars 46 forks source link

Qualified forms [RQF] #79

Closed jpullmann closed 5 years ago

jpullmann commented 6 years ago

Qualified forms [RQF]

Define qualified forms to specify additional attributes of appropriate binary relations (e.g. temporal context).

This requirement is still under review


Related use cases: Guidance on the use of qualified forms [ID19] 
riccardoAlbertoni commented 6 years ago

I suggest to untag "quality", this is not a quality-specific requirement, it affects the general way in which we want to model thing in dcat. In some of the related use cases, DQV is mentioned as a source for possible modelling examples but the requirement itself is not related to quality.

makxdekkers commented 6 years ago

I do not understand the sentence "Define qualified forms to specify additional attributes of appropriate binary relations (e.g. temporal context)". What is a 'qualified form'? Why 'binary relations'? As to the example of temporal context; this can be expressed using dct:temporal (already in DCAT v1).

andrea-perego commented 6 years ago

@makxdekkers , indeed the wording is not intuitive (my fault).

"Qualified form" is a term borrowed from PROV-O (Section 3.3). Basically, is way to specify relationships involving more than 2 entities (so, in RDF terms, it's a kind of reification).

I include below an example from the revised version of the relevant UC (available from https://andrea-perego.github.io/dxwg/ucr/#ID19):

In most cases, the relationships between datasets and related resources (e.g., author, publisher, contact point, publications / documentation, input data, model(s) / software used to create the dataset) can be specified with simple, binary properties available from widely used vocabularies - as [DCTerms] and [VOCAB-DCAT].

As an example, dcterms:source can be used to specify a relationship between a dataset (output:Dataset), and the dataset it was derived from (input:Dataset):

output:Dataset a dcat:Dataset ;
  dcterms:source input:Dataset .

input:Dataset a dcat:Dataset .

However, there may be the need of providing additional information concerning, e.g., the temporal context of a relationship, which requires the use of a more sophisticated representation, similar to the "qualified" forms used in [PROV-O]. For instance, the previous example may be further detailed by saying that the output dataset is an anonymized version of the input dataset, and that the anonymization process started at time t and ended at time t′. By using [PROV-O], this information can be expressed as follows:

output:Dataset a dcat:Dataset ;
  prov:qualifiedDerivation [
    a prov:Derivation ;
    prov:entity input:Dataset ; 
    prov:hadActivity   :data_anonymization 
] .

input:Dataset a dcat:Dataset .

# The process of anonymizing the data (load the data, process it, and generate the anonymized version)

:data_anonymization
  a prov:Activity ;
# When the process started  
  prov:startedAtTime  "2018-01-23T01:52:02Z"^^xsd:dateTime;
# When the process ended  
  prov:endedAtTime "2018-01-23T02:00:02Z"^^xsd:dateTime .
nicholascar commented 6 years ago

I would like to extend the remit of this requirement so that it reads:

"Define qualified forms to specify additional attributes of appropriate binary relations (e.g. temporal context) and to allow for the handling of relationship extensions in code lists."

This is because one of the main reasons for using qualified forms (as per PROV or ISO19115), is to allow a relationship to be defined with a role or other qualified term to be added not to supply "additional attributes" but to allow the role to come from an expandable code list whose elements are not defined within the model.

Code Lists example
An example in pseudo code. Rather than defining Dataset -> Agent relations like this:

Dataset_X publisher Agent_Y Dataset_X owner Agent_Y Dataset_X <some_named_relation> Agent_Y

Instead, ISO19115 and PROV and others do this:

Dataset_X related_to Agent_Y (role: publisher) Dataset_X related_to Agent_Y (role: owner) Dataset_X related_to Agent_Y (role: <some_named_relation>)

with a codelist table:

publisher
owner
...

In PROV, a qualified Dataset/Agent role is given by a qualifiedAttribution construction e.g.:

:Dataset_X 
   a prov:Entity;
   prov:qualifiedAttribution [
      a prov:Attribution;
      prov:agent :Agent_Y;
      ex:hadRole :publisher;
   ]

Proposal
DCAT2018 could keep the current DCAT2014 direct properties (dct:publisher) but then map that to a property chain axiom as per the example above and then allow external codelists (vocabs) of roles to define more than the standard dct role properties.

rob-metalinkage commented 6 years ago

Do we have OWL experts here who can confirm or deny we can have an axiomatic mapping that allows these alternative forms to be (safely) entailed from each other using OWL, and what flavour (profile :-) ) of OWL?

andrea-perego commented 6 years ago

@rob-metalinkage , probably it would be preferable to get in touch with the PROV mailing list and ask support, if we are going this way.

@nicholascar , just a couple of comments about your example of agent roles. We had to deal with that in GeoDCAT-AP (as DCAT and Dublin Core were not supporting all the the ISO 19115 responsible party roles), and we did end up using PROV. You can find out how we did it in UC13.

To validate this solution, we got in touch with the PROV mailing list. I think the discussion and the given explanations (especially about prof:hadRole and why it cannot be used for this purpose) could be relevant here. The email thread starts at:

https://lists.w3.org/Archives/Public/public-prov-comments/2015Apr/0001.html

andrea-perego commented 6 years ago

@dr-shorthair , thanks for preparing the page at https://github.com/w3c/dxwg/wiki/Qualified-relations

I would like to suggest / contribute some revisions, and I wonder how we should do that - as we cannot add comments to the wiki.

Meanwhile, I summarise my comments here:

  1. In the first section, in the list of vocabularies supporting qualified forms, I would suggest adding also DQV, DUV and OA
  2. In the section about "Agent roles" I would add DataCite as another example of a vocabulary defining agent roles
  3. In the same section, I would revise the use of prov:hadRole, as it seems that it cannot be used for that purpose (see https://github.com/w3c/dxwg/issues/79#issuecomment-392211132 above)
  4. In section "Related datasets", I would add dct:source to the list of DCTerms properties
  5. Finally, I think we can re-use some of the examples in https://github.com/w3c/dxwg/wiki/Provenance-patterns
dr-shorthair commented 6 years ago

Wiki pages are shared resources, and should not be bogged down with too much process - don't comment, change it. I suggest that you make friendly amendments directly - there is a history so they can be rolled back if necessary :-)

  1. Can you do that?
  2. Done
  3. Ah - so that explains why the example https://www.w3.org/TR/prov-o/#qualifiedAttribution has ex:hadRole instead of prov:hadRole. I thought it was a typo. And prov:type proposed in the correspondence does not exist. But I am also unconvinced about the arguments - clearly a property to attach a role to an attribution relationship is required. Do you understand why they baulked?
  4. Done
  5. Yes - can you weave them in?
nicholascar commented 6 years ago

I understand the reasoning for the PROV authors to not want hadRole across the board but I can't agree with their logic for not allowing it for Entity/Agent relations. For Delegation as per the PROV mailing list reasoning @andrea-perego quotes, there is potential confusion about which Agent may have played a role, so this does, I think, reasonably derail hadRole for across the board use but consider illegal-in-PROV equivalence for dct:creator:

:Dataset_X a dcat:Dataset, prov:Entity ;
  prov:qualifiedAttribution [
    a prov:Attribution ;
    prov:agent :Agent_Y ;
    prov:hadRole ex:creator ;
  ] ;

There seems no issue with the sense of this, according to PROV, but only with potential confusion among Agents that could be playing the role, as per the mainling list Delegation example. But there is no confusion here: how else can the hadRole in this example be interpreted other than it's a prov:Role that :Agent_Y is undertaking? No other Agents are involved.

As long as the Entity is not able to undertake a role, and it can't according to PROV, then we are as safe in using hadRole here as the canonical examples about Association in the PROV-O documentation.

I vote we ask PROV to extend prov:hadRole to at least Attribution, not just Association. We should not ask them to extend it to Delegation. If PROV doesn't allow this we should then implement a dcat:hadRole that is designed specifically for Entity/Agent relations that deliberately plugs the PROV-O hole and we don't use something like dcat:hadEntityAgentRole or anything else. People will see this new dcat:hadRole in use and understand the direct correlation to prov:hadRole.

I also suggest use of hadRole over hasRole for, as per PROV, if a qualifiedAttribution is made and hadRole used, that Agent would still have that role unless something invalidates it (like the Entity (Dataset) being destroyed or the Agent ceasing to be or some sort of change event (Activity) declared to invalidate/end it. i.e. we know the role was and assume it still is until told otherwise... This also then retains closer PROV alignment than hasRole which is a new thing.

nicholascar commented 6 years ago

Challenging PROV-DC

I would like to challenge an assumption in PROV-DC that I think is unhelpful in mapping PROV to Dublin Core Terms.

In examples like the dct:publisher mapping which is used as an illustrated example in Figure 1, the assumption is that the prov:Entity, which is the domain of the dct:publisher property, is "The activity [prov:Publish] must have taken as input the document [prov:Entity] to be published...".

No! the prov:Entity of interest is, in fact, the output of the activity (Publish). We don't care what was used behind the scenes ("taken as input"): we only care, and know about, the thing in the catalogue described by DCAT: the output of any prov:Publish activity.

This greatly simplified modelling. Instead of modelling :Dataset_X dct:publisher :Agent_Y ; like this in PROV-O, as per PROV-DC, dct:publisher:

:Dataset_X a prov:Entity ;
  prov:wasAttributedTo :Agent_Y .

_:usedEntity a prov:Entity ;
  prov:specializationOf :Dataset_X .

_:activity a prov:Activity, prov:Publish ;
  prov:used _:usedEntity ;
  prov:wasAssociatedWith :Agent_Y ;
    prov:qualifiedAssociation [ 
      a prov:Association ;
      prov:agent :Agent_Y ;
      prov:hadRole [a prov:Publisher] .
    ].

_:resultingEntity a prov:Entity ;
  prov:specializationOf :Entity_X ;
  prov:wasDerivedFrom _:usedEntity ;
  prov:wasGeneratedBy _:activity ;
  prov:wasAttributedTo :Agent_Y .

we model it like this:

:Dataset_X a prov:Entity ;
  prov:wasAttributedTo :Agent_Y ;
  prov:wasGeneratedBy [
    a prov:Activity, prov:Publish ;
    prov:wasAssociatedWith :Agent_Y ;
      prov:qualifiedAssociation [ 
        a prov:Association ;
        prov:agent :Agent_Y ;
        prov:hadRole [a prov:Publisher] .
      ] .
  ] .

We could add in a blank node for an unknown prov:Entity (_:usedEntity here), like this:

:Dataset_X a prov:Entity ;
  prov:wasAttributedTo :Agent_Y ;
  prov:wasGeneratedBy [
    a prov:Activity, prov:Publish ;
    prov:wasAssociatedWith :Agent_Y ;
      prov:qualifiedAssociation [ 
        a prov:Association ;
        prov:agent :Agent_Y ;
        prov:hadRole [a prov:Publisher] .
      ] ;
    prov:used _:usedEntity .
  ] .

but why? We don't know anything about the pre-published Entity so where's the value?

So in this way we have out prov:hadRole for a Publisher role without all the fuss of two extra used and generated unknown Entities.

Issued
Also, with this in mind we can then re-model dct:issued like this:

:Dataset_X dct:issued :DateTime_Z .

:Dataset_X prov:generatedAtTime :DateTime_Z .

Iff the generating prov:Activity is, in fact, a specialised prov:Publish as per PROV-DC and with dct:issued rdfs:subPropertyOf prov:generatedAtTime already given in PROV-DC. To separate this interpretation of generate out from normal prov:generatedAtTime use which would typically be used with something like dct:created (not recommended in DCAT because we don't necessarily know or care about creation, only issuance), we use this qualified form:

:Dataset_X dct:issued :DateTime_Z .

:Dataset_X a prov:Entity ;
  prov:qualifiedGeneration [
    a prov:Generation ;
    prov:atTime :DateTime_Z ;
    prov:activity [a prov:Publish] ;
  ] ;
dr-shorthair commented 5 years ago

I've done some consolidation of the resources for qualified relations. See Wiki page and some RDF resources in a new branch which is linked from the Wiki page.

In particular, there is a small extension to DCAT for qualified relations here: https://github.com/w3c/dxwg/blob/dcat-issue79-simon/dcat/rdf/dcat-qualrel.ttl which builds on the DCAT-PROV alignment nearby https://github.com/w3c/dxwg/blob/dcat-issue79-simon/dcat/rdf/dcat-prov.ttl

It's all documented in the Wiki page.

dr-shorthair commented 5 years ago

I've written up the qualified relation pattern for attribution - see #611

dr-shorthair commented 5 years ago

Now also for general resource relations - see #651

dr-shorthair commented 5 years ago

The ED now has two consecutive chapters addressing this issue:

These use the common qualified form pattern to provide extensibility in the nature of the relationship.

Is this enough to resolve this issue? Or is there a need for some text drawing attention to the common pattern (which also appears in PROV, SSN and many other places)?

dr-shorthair commented 5 years ago

I just spotted another gotcha.

In the definition of dcat:hadRole which defines the nature of a relationship of a dcat:Resource to either another Resource or to an Agent, we currently have rdfs:range prov:Role. However, the scope of prov:Role is limited to relationships to Activities, so there is an inconsistency ... I guess we need another DCAT class in parallel with the new property. Annoying complexity.

dr-shorthair commented 5 years ago

Does this work?

Qualified forms

dr-shorthair commented 5 years ago

I added a wrapper around the two qualified-forms sections. See https://rawgit.com/w3c/dxwg/dcat-issue79-simon/dcat/index.html#qualified-forms

riccardoAlbertoni commented 5 years ago

Please @dr-shorthair consider these minor comments: In the section "Relationships between datasets and agents" is written

"A general method for assigning an agent to a resource with a specified relationship"

I guess you mean "A general method for assigning an agent to a resource with a specified role". I think "relationship" is a little confusing.

In this example the roles are taken from [ISO-19115-1] which are available as linked data in a discrete list.

Could we use "code list" instead of "discrete list"?

agbeltran commented 5 years ago

Addressed in https://github.com/w3c/dxwg/pull/685