weso / shaclex

SHACL/ShEx implementation
http://weso.github.io/shaclex
MIT License
77 stars 17 forks source link

Check ill-typed literals for sh:datatype #36

Open labra opened 7 years ago

labra commented 7 years ago

Take a look at how does Jena check when an RDF datatype is ill-formed, i.e., if we have: "john"^^xsd:date the system should return an error.

Here is some documentation on how Jena tackles it.

MFSY commented 7 years ago

Hi, When checking that a property value is of a specific data type (xsd:dateTime for example) I would expect the following results:

Prefix (for both data and schema):

       @prefix : <http://example.org/>
       @prefix sh: <http://www.w3.org/ns/shacl#>
       @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   @prefix xsd: <http://www.w3.org/2001/XMLSchema#>

The schema:

                 :S a sh:NodeShape;
                 sh:targetNode :aNode ;
         sh:property [ 
                             sh:path :date ;
                             sh:datatype xsd:dateTime;
                             sh:minCount 1
                  ] .

This data is not valid:

   :aNode :date "not_a_date"^^xsd:dateTime .

This data is not valid (but the current implementation return true as validation result):

   :aNode :date "not_a_date" .

This data is valid:

   :aNode :date "2017-05-15T16:27:01-00:00" .

This data is valid:

   :aNode :date "2017-05-15T16:27:01-00:00"^^xsd:dateTime .

If I'm right the datatypeChecker which is based on es.weso.rdf.jena.wellTypedDatatype is used. I believe the issue is in the following line from es.weso.rdf.jena.wellTypedDatatype :

val jenaLiteral = emptyModel.createTypedLiteral(l.getLexicalForm,l.dataType.str)

 with the way the data type is obtained: l.dataType.str

I tried to create the datatype using org.apache.jena.datatypes.TypeMapper.getInstance().getSafeTypeByName(uri:String) and all the expectations above are met:

def wellTypedDatatype(node: RDFNode, expectedDatatype: IRI): Either[String, RDFNode] = {
    node match {
      case l: es.weso.rdf.nodes.Literal => {
        Try{
          val expectedRDFDatatype = TypeMapper.getInstance().getSafeTypeByName(expectedDatatype.str)
          val jenaLiteral = emptyModel.createTypedLiteral(l.getLexicalForm,expectedRDFDatatype)
          val value = jenaLiteral.getValue() // if it is ill-typed it raises an exception
          (jenaLiteral.getDatatypeURI)
        } match {
          case Success(iri) =>
            if (iri == expectedDatatype.str) Right(node)
            else Left(s"Datatype obtained $iri != $expectedDatatype")
          case Failure(e) => Left(e.getMessage())
        }
      }
      case _ => Left(s"Node $node is not a typed literal")
    }
  }
labra commented 7 years ago

Thanks for the comment.

It is strange because I was trying to replicate your examples and I think it was currently working as it should.

I added 2 test cases to check your same examples here

And they pass ok with the current implementation.

Anyway, I am not sure if your definition is better than the existing one in some cases. Do you know if it is?

If you could add a test-case that fails with the current definition and passes with yours, you could do a pull request that I would be happy to accept.

Another thing is that this issue is probably solved and I should close it. I will leave it open until I can check all corner cases (maybe your definition solve some).

MFSY commented 7 years ago

I made a mistake above. Indeed the following data is invalid (when xsd:dateTime is checked) and the current implementation got it right:

:aNode :date "not_a_date" .

I was referring to the following case for which I expect the xsd:dateTime type check to pass:

   :aNode :date "2017-05-15T16:27:01-00:00" .

I think the lexical form of a data type should be considered valid when checked with the same data type. With that, API using json-ld for example can accept


{
"startedAtTime": "2017-05-15T16:27:01-00:00"
}

Instead of having to always required:


{
"startedAtTime": {
    "@type": "xsd:dateTime",
    "@value": "2017-05-15T16:27:01-00:00"
  }
}

and without having to expand. Checkout the tests

labra commented 7 years ago

The problem is that I think the current implementation is following the SHACL spec, which says:

A literal matches a datatype if the literal's datatype has the same IRI and, for the datatypes supported by SPARQL 1.1, is not an ill-typed literal.

In the case of a plain string literal like "2017-05-15T16:27:01-00:00" or "Hello" the datatype is assumed to be xsd:string so it is different to xsd:dateTime and it fails as expected.

Although I understand it may be useful to automatically cast when possible, I prefer the implementation to follow the specification at this moment.

I also checked other SHACL implementations and they do the same.