Open tpluscode opened 2 years ago
Thinking out loud, I expect it would be hard for SHACL 1.1 to break SHACL 1.0 semantics. So for this to work without breaking the existing logic I guess SHACL 1.1 could introduce a second, optional flag for sh:DatatypeConstraintComponent to also accept, e.g. xsd:int when xsd:integer was requested.
So maybe sh:datatype xsd:integer ; sh:datatypeNormalization true ;
Furthermore, triple stores could indicate whether they can round-trip datatypes correctly and if not the SHACL engine can set the above flag true as the default.
I don't necessarily agree that it's breaking 1.0 semantics for you cannot break what's already broken.
I think what stores do make sense from the perspective of XSD and robustness principle. All types derived from xsd:integer
are essentially that plus range constraints. That is pretty much XSD's definition. Being "liberal in what we accept" means a validator should treat "10"^^xsd:integer
as a valid xsd:byte
Thus, I conclude that the current behaviour of SHACL is flawed and deserves as minor version fix.
Furthermore, triple stores could indicate whether they can round-trip datatypes correctly and if not the SHACL engine can set the above flag true as the default.
I don't think this is practical. A web app which dereferenced the resource and shape has no idea whatsoever about backing triplestores
Useful table.
A second flag to modify the constraint is better IMO. sh:datatypeByValue
("normalization" is usually more like a change to canonical form "0001" and "1" but same datatype).
As this does work today in many places, if not all, and some systems may currently wish to validate the exact datatype used (xsd:nonNegativeInteger
is meaningful as it isn't due to the arbitrary 32 bit/64 bit issues), I don't think it can be classified as "broken".
The majority user feedback on TDB1 was to keep the datatype. Normalizing the lexicial part was just accepted.
I think I will stand by "broken" since the base semantics of datatypes is defined by XSD spec. The stores will turn derived literal with XSD types to xsd:integer
because that is what they are (the literals, not store). Of course, not everyone necessarily uses triple stores at all but it will be a major use case. And TDB is not the only implementation which does that.
Like I said, if I create a graph and it does not validate after round tripping from my database then this is a bug, not a feature. But I hear you, if the concern is breaking users depending on the current spec, then I'm probably proposing a change for SHACL 2.0
:)
Ah, and you make a good point about comparing the lexical form too. We could consider more holistic improvements, for example also to sh:hasValue
. Consider https://s.zazuko.com/2boDM, where sh:hasValue 1
fails to validate "0001"^^xsd:integer
@tpluscode well, if the round-tripping does not work then it's a problem that the graph stores need to solve, not languages like SHACL downstream. I guess other systems like OWL (with its xsd:nonNegativeIntegers everywhere) would have similar issues. Fix it at the lowest possible level.
I like the suggested sh:datatypeByValue property.
[@tpluscode] given a property shape with
sh:datatype xsd:int
, it will fail the check against the a value""10"^^xsd:integer
even though the typexsd:int
is subtype ofxsd:integer
I'm surprised no-one else flagged this, but I think your example is backwards.
That is, even were subtype inferences active, it would be correct for ""10"^^xsd:integer
to fail against sh:datatype xsd:int
, because while all xsd:int
are xsd:integer
, some xsd:integer
are not xsd:int
, as that is the definition of a subtype.
On the other hand, subtype inferencing might enable ""10"^^xsd:int
to succeed against sh:datatype xsd:integer
.
@TallTed I see your point but it can also be seen as a use case for validation by value.
F&O does not treat derived types of integers specially and any operation returns xsd:integers, "0"^^xsd:int+"10"^^xsd:int
is "10"^^xsd:integer
.
What
With SHACL 1.0 the
sh:datatype
constraint mandates exact match. That is, given a property shape withsh:datatype xsd:int
, it will fail the check against the a value""10"^^xsd:integer
even though the typexsd:int
is subtype ofxsd:integer
As a reminder, here's the XSD types hierarchy from https://www.w3.org/TR/xmlschema-2/
Why
This is problematic because it turns out that multiple triplestore implementations will proactively "normalise" integer types on insert. As above, inserting
"10"^^xsd:int
will in fact store"10"^^xsd:integer
in the database. I tried Fuseki (TDB1), Stardog and Allegrograph. It seems that all integer and derived datatypes will treated this way as long as their lexical form matches the given datatype."-10"^^xsd:negativeInteger
becomes"-10"^^xsd:integer
etc.This makes it impossible to roundtrip values created by a shape as shown below because when the resource comes back from the store, the type of
schema:age
will no longer match that of the property shapeProposal
I propose to prescribe to shacl processors that they should also "normalise" the property shapes but deconstructing xsd integer datatypes to
xsd:integer
and appropriatemin/max
as defined by XML Schema (unless explicitly provided by the shape itself)sh:datatype
sh:minInclusive
sh:maxInclusive
xsd:long
-9223372036854775808
9223372036854775807
xsd:int
-2147483648
2147483647
xsd:short
-32768
32767
xsd:byte
-128
127
xsd:nonPositiveInteger
0
xsd:negativeInteger
-1
xsd:nonNegativeInteger
0
xsd:positiveInteger
1
xsd:unsignedLong
0
18446744073709551615
xsd:unsignedInt
0
4294967295
xsd:unsignedShort
0
65535
xsd:unsignedInt
0
255