w3c / data-shapes

RDF Data Shapes WG repo
87 stars 33 forks source link

union of targets should be DISTINCT #143

Closed VladimirAlexiev closed 4 months ago

VladimirAlexiev commented 2 years ago

(Thread: https://lists.w3.org/Archives/Public/public-shacl/2022Jan/)

https://www.w3.org/TR/shacl/#targets says: "union of terms produced by the individual targets that are declared by the shape".

Say I have a shape with the following targeting:

sh:targetClass :Foo;
sh:targetSubjectsOf :bar, :baz;
sh:targetObjectsOf :blor;

Say a node matches all of these conditions: will it be selected for validation once and not 4 times?

I.e., is the "union of terms" supposed to be DISTINCT? (UNION in mathematics is distinct, but not in SPARQL)

@HolgerKnublauch> (TQ API) is using a Set which means each target node will only be validated once even if in multiple targets at the same shape. I believe this is following the intention of the spec. Does any implementer here disagree?

Vladimir: Agreed. But still, the spec should mention DISTINCT. I'll post this here as an "SHACL Erratum", as per https://github.com/w3c/data-shapes/issues/103


Ashley Sommer> PySHACL does the same. The final collection of targets is a Set object, which deduplicates any identical nodes that are added.


Irene Polikoff> To me, this sounds more like an implementation question, rather than a standards question.

Vladimir: The number of Validation Results will be different (unless targets are distinct, there will be duplicate results). Even if one stored Validation Results in a repo, they would not be deduplicated since it's not likely Results can use deterministic URLs (not blank nodes or UUID URNs).

The impact on performance will be a linear slowdown. If that shape causes a lot of other shapes to be invoked, that can be very significant.

HolgerKnublauch commented 2 years ago

The spec clearly states that we are talking about sets of terms, and sets are mathematical constructs where the union is another set. "The target of a target declaration is the set of RDF terms".

There would be no harm in making this clearer by adding a word here, but I am not sure why people refer to SPARQL's UNION keyword here. SPARQL is about bindings, not sets of terms.

afs commented 2 years ago

I agree it is clear at the moment. A union of things that are sets is a set hence unique terms.

Each of the target definitions says "set" as well, except sh:targetNode which is a singleton.

afs commented 2 years ago

SPARQL evaluation works with multi-sets -- set + cardinality of each element. "union" of multi-sets sums the cardinality of elements.

VladimirAlexiev commented 2 years ago

@hmottestad

At the moment rdf4j ShaclSail splits shapes by both target and constraint. This means that we will produce two validation results if a node matches two target declarations. The validation is run in parallel, so performance may not be adversely affected and it could also be that the simplifications of only having to use one target declaration makes things faster than having to consider multiple target declarations at once.

He posted https://github.com/eclipse/rdf4j/issues/3584

HolgerKnublauch commented 4 months ago

In preparation for a potential future SHACL WG I would like to close GitHub issues that were mainly just questions. Please reopen if you disagree.