shexSpec / shex

ShEx language issues, including new features for e.g. ShEx2.1
24 stars 8 forks source link

Constraint weight annotation #76

Open jimkont opened 6 years ago

jimkont commented 6 years ago

As discussed in https://github.com/schemaorg/schemaorg/issues/1715. A possibly useful feature for ShEx would be to have a more consistent way to define constraint weight annotations for e.g.

"strongly recommended", "excellent to include if you have it", "would be nice", "optional", "permitted", "mandatory", "required" etc

This could come directly in the constraint definition like

<S> {
    SHOULD <p1> . ;
    MAY (<p2> . ; 
              <p3> . );
    SUPERNICETOHAVE <p4> . ; 
}

of in the annotations part with options like

<S> {
    <p1> . ; // sx:req my:SHOULD
    <p2> . ; // sx:req "MAY"
    <p3> . ; // NICETOHAVE
    <p4> . ; // ss:SUPERNICETOHAVE # => ss is a predefined namespace
}
gkellogg commented 6 years ago

I think it should be part of the main language, not annotations, as it relates to control flow. Anything other than "mandatory" should be treated something like ? or *, although we should consider how it interacts with cardinally.

What does it mean to say SHOULD <p1>+? Is this the same as SHOULD <p1>*?

Does SHOULD interact with anything else so that it is like MUST IF? (probably not).

jimkont commented 6 years ago

One idea that was discussed is to request validation with a desired "weight"/"severity". Then the schema is pre-processed and all constraints below the required level are ignored.

e.g validate D1 against S1 with MUST

when original S1 is:

<S1> {
    MUST <p1> . ;
    SHOULD <p2> . ;
}

would be processed to:

<S1> {
    MUST <p1> . ;
}

and perform validation with existing workflow

the preprocessing would need some special care with closed shapes thought.

ericprud commented 6 years ago

In SHACL, severities are treated as annotations, meaning they have no particular hierarchical semantics and are just copied into error reports. One of the drivers for severity levels was Google's RDF Validation Workshop submission. The SHACL severity model doesn't conveniently address schema.org's needs as they need separate severities for something being missing vs. something present but with the wrong datatype.

One solution to this would be to use @jimkont's approach and expand on the Validata header which declared their own severity strings with a global directive. If we supplement that with a standard set of violations, we could attractively address the schema.org use case:

NEED = *:2; # bad both if it's missing or has the wrong type.
WANT = missing:1, *:2; # lower warning if it's missing.
<IssueShape> {
  WANT :description xsd:string ;
  NEED :creator @<PersonShape> ;
             :assignedTo @<PersonShape>? ;
}
<PersonShape> {
  WANT foaf:name xsd:string ;
  NEED foaf:mbox IRI
}

One question with either strategy is what happens with a higher severity nested in a lower severity:

<issue1> :description ""; :creator <x> ; :assignedTo <y> .
<x> ...
<y> :foaf:mbox <mailto:emily@example.org> .

<y> is missing a WANTed foaf:name but is only being validated for a presumably optional (from the cardinality) :assignedTo. A strategy would be to say that only the highest error level keeps something from validating so <issue1> is fine even though the object of its :assignedTo had a warning.

jimkont commented 6 years ago

Example we discussed yesterday to discuss how this can play in the open and if severities / constraint weights can be overriden. Assume <Person> is imported from an external location where we have no control over.

<Issue> {
    MUST :submitter @<Person>
    SHOULD :reviewer @<Person>
    MAY :merger @<Person>
}
<Person> {
    MUST: name .
    SHOULD: lastName .
    MAY: middleName .
}

There are many combinations of things that can fail with different severities here