Open labra opened 7 years ago
Hi, When checking that a property value is of a specific data type (xsd:dateTime for example) I would expect the following results:
Prefix (for both data and schema):
@prefix : <http://example.org/>
@prefix sh: <http://www.w3.org/ns/shacl#>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>
The schema:
:S a sh:NodeShape;
sh:targetNode :aNode ;
sh:property [
sh:path :date ;
sh:datatype xsd:dateTime;
sh:minCount 1
] .
This data is not valid:
:aNode :date "not_a_date"^^xsd:dateTime .
This data is not valid (but the current implementation return true as validation result):
:aNode :date "not_a_date" .
This data is valid:
:aNode :date "2017-05-15T16:27:01-00:00" .
This data is valid:
:aNode :date "2017-05-15T16:27:01-00:00"^^xsd:dateTime .
If I'm right the datatypeChecker which is based on es.weso.rdf.jena.wellTypedDatatype is used. I believe the issue is in the following line from es.weso.rdf.jena.wellTypedDatatype :
val jenaLiteral = emptyModel.createTypedLiteral(l.getLexicalForm,l.dataType.str)
with the way the data type is obtained: l.dataType.str
I tried to create the datatype using org.apache.jena.datatypes.TypeMapper.getInstance().getSafeTypeByName(uri:String) and all the expectations above are met:
def wellTypedDatatype(node: RDFNode, expectedDatatype: IRI): Either[String, RDFNode] = {
node match {
case l: es.weso.rdf.nodes.Literal => {
Try{
val expectedRDFDatatype = TypeMapper.getInstance().getSafeTypeByName(expectedDatatype.str)
val jenaLiteral = emptyModel.createTypedLiteral(l.getLexicalForm,expectedRDFDatatype)
val value = jenaLiteral.getValue() // if it is ill-typed it raises an exception
(jenaLiteral.getDatatypeURI)
} match {
case Success(iri) =>
if (iri == expectedDatatype.str) Right(node)
else Left(s"Datatype obtained $iri != $expectedDatatype")
case Failure(e) => Left(e.getMessage())
}
}
case _ => Left(s"Node $node is not a typed literal")
}
}
Thanks for the comment.
It is strange because I was trying to replicate your examples and I think it was currently working as it should.
I added 2 test cases to check your same examples here
And they pass ok with the current implementation.
Anyway, I am not sure if your definition is better than the existing one in some cases. Do you know if it is?
If you could add a test-case that fails with the current definition and passes with yours, you could do a pull request that I would be happy to accept.
Another thing is that this issue is probably solved and I should close it. I will leave it open until I can check all corner cases (maybe your definition solve some).
I made a mistake above. Indeed the following data is invalid (when xsd:dateTime is checked) and the current implementation got it right:
:aNode :date "not_a_date" .
I was referring to the following case for which I expect the xsd:dateTime type check to pass:
:aNode :date "2017-05-15T16:27:01-00:00" .
I think the lexical form of a data type should be considered valid when checked with the same data type. With that, API using json-ld for example can accept
{
"startedAtTime": "2017-05-15T16:27:01-00:00"
}
Instead of having to always required:
{
"startedAtTime": {
"@type": "xsd:dateTime",
"@value": "2017-05-15T16:27:01-00:00"
}
}
and without having to expand. Checkout the tests
The problem is that I think the current implementation is following the SHACL spec, which says:
A literal matches a datatype if the literal's datatype has the same IRI and, for the datatypes supported by SPARQL 1.1, is not an ill-typed literal.
In the case of a plain string literal like "2017-05-15T16:27:01-00:00"
or "Hello"
the datatype is assumed to be xsd:string
so it is different to xsd:dateTime
and it fails as expected.
Although I understand it may be useful to automatically cast when possible, I prefer the implementation to follow the specification at this moment.
I also checked other SHACL implementations and they do the same.
Take a look at how does Jena check when an RDF datatype is ill-formed, i.e., if we have:
"john"^^xsd:date
the system should return an error.Here is some documentation on how Jena tackles it.