Open VladimirAlexiev opened 2 years ago
@HolgerKnublauch
Yes, the interpretation should be sameTerm, and the prose could sometimes be clearer on that. Blame the editor on the latter. Unfortunately there is no SHACL 1.1 WG to fix formal definitions that could lead to further controversies. Meanwhile, I guess if implementors are not sure what to do, they look at other implementations.
Ok, I can understand that.
Technically, one reason for doing sameTerm semantics was performance.
Agreed.
I think that a lot of repositories (eg GDB) have additional "literal indexes" to handle comparisons like =
and <
quickly,
but still, the standard spo
(in particular o
) index is faster
One way to evolve your use cases could be to introduce an optional boolean flag such as sh:matchEquality true which could be a second argument to sh:equals, sh:hasValue and sh:in. Another way would be (as you say) to introduce completely new constraint components.
Ok, but this flag should be in the enclosing PropertyShape? Like this?
# Power plant shape
sh:not [sh:property [sh:path tr:installedCapacity; sh:hasValue "0"^^xsd:float; sh:matchEquality true]]
# Outage shape
sh:property [sh:path (tr:energyResource tr:installedCapacity); sh:equals tr:installedCapacity; sh:matchEquality true]
@HolgerKnublauch, this sounds good, is it feasible to standardize dash:matchEquality
?
@afs
Another option ("as well as", not "instead of") is to describe a validation mode. It would also cover the case where the data were to be canonicalized as some triplestores already do.
I'm less keen on this since it seems likely both modes could be needed for the same set of shapes.
both modes could be needed for the same set of shapes.
But adding a triple flag does not work. In RDF, subgraphs can stand alone. So the shape, without the triple must also work. If adding or deleting the triple changes the meaning of another triple (the actual constraint), that is feature lost.
Separate properties or canonicalize the data (which may be logically canonicalize the data).
@afs I don't understand this argument. There are already some constraint components that take multiple arguments, e.g. sh:closed and sh:qualifiedValuesShape.
With canonical data, some choices have already been made when the data graph triples were added, so I guess the SHACL engine would simply need to ask the graph whether it has canonicalized the values or not and then probably also canonicalize the values mentioned in the shapes graph. For such data graphs, I guess it doesn't even have the option to compare using sameTerm.
In this case, the new property says "ignore how the other property is defined, do value matching". That is different to sh:pattern/sh:flags, which is a description of the regex, because the defn of sh:pattern mentions sh:flags.
it has canonicalized the values
Or the data is presented to the SHACL engine as canonicalized. It can remain in term form. Yes, compare using sameTerm is not available.
While RDF is term-centric, it is going to be a bit convoluted to get right.
(Originally posted as https://lists.w3.org/Archives/Public/public-shacl/2022May/0000.html; edited below).
Several constraint components compare values using identity (
sameTerm
) rather than equality (=
). Which is ok for many cases (URLs, strings), but not all (numbers, booleans); dates are also peculiar.For numeric and boolean literals, identity uses the lexical space, whereas equality uses the value space. The following are true for equality but not for identity:
Which means that it's unnecessarily hard or even impossible to express rules like this in standard SHACL (examples from https://transparency.ontotext.com/spec/#validation-rules):
vatInVies
should not be false (boolean). This can be expressed using the two lexical representations of "false".installedCapacity
should not be zero (float). There are infinitely many lexical representations of zero (even crazy ones like "0.0000000000001"^^xsd:float)installedCapacity
of an outage record should be equal toinstalledCapacity
in the master power plant record (float). One cannot express this in standard SHACLLooking at some constraint components in the spec:
sh:minCount, maxCount
also implicitly use identity to count valuesI propose these clarifications:
Furthermore, I propose to add new constraint components that use equality rather than identity. I cannot come up with better names, so suggestions are welcome. I guess it's impossible to add them to sh: so maybe add them to dash: ?
dash:equalEQ
dash:hasValueEQ
dash:inEQ