w3c / EasierRDF

Making RDF easy enough for most developers
267 stars 13 forks source link

SHACL validators should intuitively process OWL and RDFS #103

Closed KonradHoeffner closed 2 years ago

KonradHoeffner commented 2 years ago

SHACL is a great step forward for data validation, but having to spell out everything again when you already have an ontology is very cumbersome and error prone. I know theoretically there are reasons why this is necessary, but in the spirit of the EasierRDF initiative there should be a mode to just tell my SHACL validator that I want a common sense interpretation of RDFS and OWL in the context of closed world data validation.

Example test.ttl

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix : <http://example.org/>.

:Human a owl:Class.    

:knows rdfs:domain :Human;    
       rdfs:range :Human.

:Bob a :Human;    
     :knows :Herbert.    

:Herbert a :Horse.   
$ pyshacl test.ttl -i both
Validation Report
Conforms: True

A reasoner would probably infer that :Herbert is a human-horse hybrid (centaur) while a closed world validation should return an error. pySHACL however returns nothing because there is no shape. Instead one has to duplicate all the domain and range statements in SHACL:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix sh:<http://www.w3.org/ns/shacl#>.
@prefix : <http://example.org/>.

:knowsRangeShape a sh:NodeShape;
    sh:targetObjectsOf :knows;
    sh:class :Human.

:knowsDomainShape a sh:NodeShape;     
    sh:targetSubjectsOf :knows;
    sh:class :Human.
$ pyshacl test.ttl -s shacl.ttl
Validation Report
Conforms: False
Results (2):
Constraint Violation in ClassConstraintComponent (http://www.w3.org/ns/shacl#ClassConstraintComponent):
    Severity: sh:Violation
    Source Shape: :knowsRangeShape
    Focus Node: :Herbert
    Value Node: :Herbert
    Message: Value does not have class :Human
Constraint Violation in ClassConstraintComponent (http://www.w3.org/ns/shacl#ClassConstraintComponent):
    Severity: sh:Violation
    Source Shape: :knowsDomainShape
    Focus Node: :Bob
    Value Node: :Bob
    Message: Value does not have class :Human
afs commented 2 years ago

The SHACL community is at: https://discord.com/channels/911006583067144212/

TallTed commented 2 years ago

The SHACL WG was pushed rather strongly to not require any reasoning, including OWL and RDFS, of SHACL processors, and this led largely to not even mentioning it. Examples were pushed in similar direction.

We (I was a participant for most of the WG, and co-chaired the last several months after some employer restructuring required our original chair to step down) did make significant efforts to allow use of existing ontologies as shape definitions with minimal adjustment.

Regrettably, I'm not really fluent in SHACL even with all that time spent, but others (primarily @HolgerKnublauch) do tend to respond helpfully to issues and other questions raised here.

HolgerKnublauch commented 2 years ago

Hi Konrad, yes to both statements from yourself and Ted. SHACL operates on RDF graphs only and has no formal dependency on RDFS or OWL semantics. This is the same approach that was also taken for SPARQL. All that SHACL (and SPARQL) engines "see" are the triples that happen to be in the data graph. It does not make any assumptions about whether these triples include the inferred triples, but applications can make those assumptions if they want to. Most APIs that I have seen provide a reasoning component that makes the inferred triples "visible" either by dynamically computing them when needed or by doing a batch process in the beginning. Once you pass this extended graph into the SHACL engine, you should see the desired results.

The SHACL vocabulary has an annotation property sh:entailment to indicate to engines whether they should apply the inferences automatically, see https://www.w3.org/TR/shacl/#shacl-rdfs. However this is not mandatory - SHACL is intentionally decoupled from any sort of inferencing (except very basic rdfs:subClassOf walking at sh:class and similar places).

The broader topic, of whether RDFS/OWL really lead to an Easier RDF, is an entirely different question. I personally believe the complexity and non-intuitiveness of RDFS and OWL are an obstacle to RDF adoption, and don't make it easier for average IT professionals.

pchampin commented 2 years ago

SHACL operates on RDF graphs only and has no formal dependency on RDFS or OWL semantics.

Well SHACL redefines for its own purpose a subset of RDFS semantics (SHACL subClass [1]) based on the same vocabulary (rdf:type, rdfs:subClassOf), so to the casual user, it may seem like there are some dependencies...

[1] https://www.w3.org/TR/shacl/#dfn-shacl-subclass

HolgerKnublauch commented 2 years ago

What choice did we have. We could have introduced sh:type and sh:subClassOf but then nobody would be using SHACL now. RDF(S) was there first.

KonradHoeffner commented 2 years ago

Thanks for all the detailed answers! I understand that there are many good reasons why SHACL works they way it does. However in the context of the EasierRDF initiative I think there should be an easier, low barrier way to validate data. In my experience, and maybe this is not the average case, an RDFS/OWL ontology and knowledge base already exists, because that is the principal output of some research project, and there may or may not be all kinds of errors in the data, but noone knows. In that case, using the existing RDFS/OWL data is "free", while not every project has someone who is capable and who has the time to create an additional SHACL shape file. RDFS and OWL inference was already activated in the example at the top with the -i both switch, however it still didn't find the domain and range violation. Maybe all that is needed for that use case is a script, which takes as input an ontology, and generates a SHACL shape.

HolgerKnublauch commented 2 years ago

For pyshacl question, you may want to follow up with the developer(s), e.g. on the SHACL Discord server.

Not sure about this ticket here, and whether you want to close it.

KonradHoeffner commented 2 years ago

OK, I will close it!