w3c / data-shapes

RDF Data Shapes WG repo
87 stars 33 forks source link

Closed hierarchic shapes #129

Closed wouterbeek closed 3 years ago

wouterbeek commented 3 years ago

It is unclear to me whether/how closed hierarchic shapes are supposed to work.

In the following example I specify that instances of <Top> must have a <p>. I may add additional restrictions on <Bottom> and then I close the shape for <Bottom> (a little bit similar to how I can close the OOP hierarchy in C++ with final).

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix sh: <http://www.w3.org/ns/shacl#>

<TopShape>
  sh:property
    [ sh:minCount 1;
      sh:path <p> ];
  sh:targetClass <Top>.

<BottomShape>
  sh:closed true;
  sh:ignoredProperties ( rdf:type );
  sh:targetClass <Bottom>.

Contrary to what I expected, the validation library that I use emits a violation for the following dataset:

<Bottom><http://www.w3.org/2000/01/rdf-schema#subClassOf><Top>.
<bottom>a<Bottom>;<p>"".

It claims that <p> is not part of the closed shape for <Bottom> :-(

Looking this up in the SHACL specification document, I am not sure what the intended behavior should be.

I very much hope that the library is wrong and that sh:closed true does not exclude property shapes specified on parent node shapes.

If sh:closed true is intended to exclude properties specified in parent node shapes, then maintaining closed hierarchies in SHACL become quite inconvenient: every property shape specified for a parent node shape must be added to the sh:ignoredProperties list of every child node shape. (In the above example, <p> must be added to the sh:ignoredProperties of <BottomShape>.)

HolgerKnublauch commented 3 years ago

sh:closed from the spec is very simple and limited, and does not look at any form of inherited/hierarchical property shapes.

We have implemented a richer variation of this at http://datashapes.org/constraints.html#ClosedByTypesConstraintComponent for the case where classes are also shapes. Similar custom constraint components could be implemented to walk up sh:node hierarchies etc.

FWIW your example rdfs:subClassOf seems incomplete, as these are only sh:NodeShapes but not also instances of rdfs:Class.

wouterbeek commented 3 years ago

@HolgerKnublauch Thank you for your reply! Based on your reply I am now afraid that I fundamentally misunderstand how SHACL hierarchies work... I was under the impression that they could be used for property shape reuse, without the added requirement that node shapes become identical to classes. The separation between node shapes and classes seems crucial to me when external vocabularies are used.

What is missing for me is an example on how SHACL hierarchies are intended to be used in practice. To illustrate my current level of understanding, I was thinking that the following was the intended use:

def:Building rdfs:subClassOf shape:Feature.
def:Road rdfs:subClassOf shape:Feature.

shape:Feature
  sh:property shape:geometry; # reuse
  sh:tagetClass geo:Feature.

shape:Building
  sh:property shape:address;
  sh:targetClass vocab:Building.

shape:Road
  sh:property shape:surfaceType;
  sh:targetClass vocab:Road.

^ Buildings and roads are both features. For features we can reuse the external geo:Feature class (from GeoSPARQL; no need to roll our own). Since buildings and roads both have geometries, we can specify this at the feature level. Since other properties are specific to buildings and roads, these can be specified at lower levels.

The above example is a big deal, since duplicating M properties for N subclasses means M×(N−1) duplications. It is not uncommon for M and N to both be in the 10-20 range, resulting in hundreds of extra triples that have to be written down and maintained.

If I understand you correctly, then in order for the SHACL hierarchy to work I must equate node shapes to classes? For shape:Building == vocab:Building that might be doable, but for shape:Feature == geo:Feature I am essentially 'editing' somebody else's definition. I liked the possibility to separate classes from node shapes, since this allows me to specify the way in which my data model reuses external vocabularies.

HolgerKnublauch commented 3 years ago

If you prefer to not use shapes that are also classes, you can link them into a super-shape hierarchy using sh:node. For example, shape:Building sh:node shape:Feature is similar to vocab:Building rdfs:subClassOf geo:Feature. sh:node basically means that all constraints of the shape that is the value of sh:node also apply to the target nodes of the subject of the sh:node triple.

On your last paragraph, no classes must not equate to shapes at all, and sh:targetClass will equally walk up or down the class hierarchy. So the example that you give above looks OK to me. shape:Feature will apply to all instances of geo:Feature, and shape:Building will only apply to those instances that are vocab:Building. If Building subClassOf Feature then all instances of Building will also count as Features.

Make sure though that the rdfs:subClassOf triples that drive the targetClass mechanism can be found in the data graph, not (only) in the shapes graph.

Is this clearer now?

wouterbeek commented 3 years ago

Thanks! Things are quite a bit clearer now :-)

I was expecting sh:closed to already behave like http://datashapes.org/constraints.html#ClosedByTypesConstraintComponent, but I now understand that this not the case. IMO sh:closed is not so useful; ClosedByTypesConstraintComponent seems like a better default to express that leaf nodes are closed.