w3c / data-shapes

RDF Data Shapes WG repo
89 stars 33 forks source link

non-redundant multiple sh:minInclusive #68

Closed pfps closed 7 years ago

pfps commented 7 years ago

In the following shape neither of the constraints are redundant.

ex:s1 a sh:PropertyShape ; sh:path ex:p ; sh:minInclusive "2002-10-10T12:00:00-05:00"^^xsd:dateTime ; sh:minInclusive "2002-10-10T12:00:00"^^xsd:dateTime .

This undercuts any rationale for excluding shapes with multiple comparisons.

HolgerKnublauch commented 7 years ago

As usual, the fact that there are corner cases doesn't mean it makes sense in general.

HolgerKnublauch commented 7 years ago

BTW even if somebody would want to express multiple sh:minInclusives, there is already an easy work-around by creating a nested sh:node shape or use sh:and.

TallTed commented 7 years ago

Multiple values for (for example) sh:minInclusive were not excluded only because of possible redundancy. There were multiple considerations, which I won't detail here.

That said, if you refer to the XML Schema spec definition of xsd:dateTime, you can see that your second value ("2002-10-10T12:00:00"^^xsd:dateTime) is significantly less restrictive than your first ("2002-10-10T12:00:00-05:00"^^xsd:dateTime).

For comparisons with other xsd:dateTime values, your second value is in fact treated as two values — "2002-10-10T12:00:00-14:00"^^xsd:dateTime and "2002-10-10T12:00:00+14:00"^^xsd:dateTime — which comparisons would therefore produce conflicting results just in comparison to your first! (That is, your second value is both before and after, both lower and higher, than your first value.)

In other words -- your two constraints are not redundant, but conflicting; and the latter may not have the desired nor anticipated result.

pfps commented 7 years ago

xsd:dateTime is a somewhat unusual datatype because of the optional timezone. This means that order relation on dateTime is partial https://www.w3.org/TR/xmlschema-2/ 3.2.7.4 (and that certain datetype lexical values are incomparable https://www.w3.org/TR/xmlschema11-2/#theSevenPropertyModel).

It turns out that "2002-10-10T12:00:01-05:00"^^xsd:dateTime is greater than "2002-10-10T12:00:00-05:00"^^xsd:dateTime but neither less than nor greater than "2002-10-10T12:00:00"^^xsd:dateTime (because "2002-10-10T12:00:00-14:00"^^xsd:dateTime is less than "2002-10-10T12:00:01-05:00"^^xsd:dateTime but "2002-10-10T12:00:00+14:00"^^xsd:dateTime is not less than "2002-10-10T12:00:01-05:00"^^xsd:dateTime). Similarly "2002-10-10T12:00:00"^^xsd:dateTime is less than "2002-10-10T12:00:01"^^xsd:dateTime but neither less than nor greater than "2002-10-10T12:00:01-05:00"^^xsd:dateTime. It also turns out that "2002-11-10T12:00:01-05:00"^^xsd:dateTime is greater than both "2002-10-10T12:00:00-05:00"^^xsd:dateTime and "2002-10-10T12:00:00"^^xsd:dateTime. So sh:minInclusive "2002-10-10T12:00:00-05:00"^^xsd:dateTime does not conflict with sh:minInclusive "2002-10-10T12:00:00"^^xsd:dateTime.

To exclude all dateTime values before "2002-10-10T12:00:00-05:00"^^xsd:dateTime and also before "2002-10-10T12:00:00"^^xsd:dateTime requires two uses of sh:minInclusive, as in

ex:s1 a sh:PropertyShape ; sh:path ex:p ; sh:minInclusive "2002-10-10T12:00:00-05:00"^^xsd:dateTime ; sh:minInclusive "2002-10-10T12:00:00"^^xsd:dateTime .

TallTed commented 7 years ago

You picked a fun one.

Conveniently, each of the xsd:dateTime values in play here may be converted to a single timezone for direct comparisons.

2002-10-10T12:00:00-14:00 == Thu, 10 Oct 2002 22:00:00 -0400 2002-10-10T12:00:00-05:00 == Thu, 10 Oct 2002 13:00:00 -0400 2002-10-10T12:00:00+14:00 == Wed, 09 Oct 2002 18:00:00 -0400

I believe that to exclude all dateTime values before "2002-10-10T12:00:00-05:00"^^xsd:dateTime and also before "2002-10-10T12:00:00"^^xsd:dateTime (which works out to be syntactic sugar for all dateTime values before "2002-10-10T12:00:00-14:00"^^xsd:dateTime and also before "2002-10-10T12:00:00+14:00"^^xsd:dateTime) requires just one use of sh:minInclusive, that being the earliest of these -- because to be before all, it must be before the earliest. In other words --

ex:s1 a sh:PropertyShape ;
sh:path ex:p ;
sh:minInclusive "2002-10-10T12:00:00+14:00"^^xsd:dateTime .

On the other side, to exclude all dateTime values after "2002-10-10T12:00:00-05:00"^^xsd:dateTime and also after "2002-10-10T12:00:00"^^xsd:dateTime (which works out to be syntactic sugar for all dateTime values after "2002-10-10T12:00:00-14:00"^^xsd:dateTime and also after "2002-10-10T12:00:00+14:00"^^xsd:dateTime) requires just one use of sh:minInclusive, that being the latest of these -- because to be after all, it must be after the latest. In other words --

ex:s1 a sh:PropertyShape ;
sh:path ex:p ;
sh:maxInclusive "2002-10-10T12:00:00-14:00"^^xsd:dateTime .
pfps commented 7 years ago

The two values "2002-10-10T12:00:00-05:00"^^xsd:dateTime and "2002-10-10T12:00:00"^^xsd:dateTime are incomparable, i.e., neither is less than or greater than or equal to the other, so for starters neither of them is the earliest.

ex:s1 a sh:PropertyShape ; sh:path ex:p ; sh:minInclusive "2002-10-10T12:00:00+14:00"^^xsd:dateTime .

excludes all dateTime values with a time zone field whose time value is before 2002-10-10T12:00:00+14:00 and also all dateTime values without a time zone field whose time value is before 2002-10-10T12:00:00 (because 14:00 is the maximum offset between the timeline with time zone values and the timeline without time zone values).

This does not exclude dateTime values with a time zone whose time value is greater than 2002-10-10T12:00:00+14:00 but less than or equal to 2002-10-10T12:00:00-05:00, which are excluded by

ex:s1 a sh:PropertyShape ; sh:path ex:p ; sh:minInclusive "2002-10-10T12:00:00-05:00"^^xsd:dateTime ; sh:minInclusive "2002-10-10T12:00:00"^^xsd:dateTime .

Neither constraint is redundant.

TallTed commented 7 years ago

As I said earlier, possible redundancy (which seems to be your focus) was not the only basis for limiting such predicates to one value.

But let me see if I understand your scenario correctly.

You're suggesting that someone's data may include xsd:dateTime values both with and without timezone, and they may want to permit that in their shape, and therefore need both of these constraints, which are not redundant in such case.

I do not see how this scenario requires multiple values for these predicates, and I see many ways that allowing such could increase user confusion and potential for error.

I believe Holger has already provided two viable solutions for such scenario -- either a nested sh:node shape or sh:and (or, I think, sh:or, depending on the precise need). It may be worth adding some such constructs to the samples collection, based on this issue.

pfps commented 7 years ago

I went back through the email records and meeting minutes, and found the following:

On 03/21/2017 06:11 PM, Holger Knublauch wrote:

Note that there are several other constraint components that really only should have one value, e.g. sh:datatype. I would argue that there is just a small number of the core components where multiple values make sense (sh:class, sh:property and sh:node and the logical operators). So for now I have added similar maxCount=1 rules to

  • sh:datatype
  • sh:nodeKind
  • sh:minCount
  • sh:maxCount
  • sh:min/max/in/exclusive
  • sh:minLength
  • sh:maxLength
  • sh:languageIn
  • sh:in

From https://www.w3.org/2017/03/22-shapes-minutes.html

hknublau: (all this is in the email) ... so I went through and found all these properties where having two values would be redundant. So I added syntax rules to make all these properties maxCount=1.

So the only criterion was redundancy.

HolgerKnublauch commented 7 years ago

The emails that you quote are zero evidence for your claim. There were lots of oral discussions too. And we spent almost half an hour on this topic yesterday.

I honestly really have no idea why you find this so important. We have given you simple work-arounds in case anybody really thinks they must state such a weird constraint.

simonstey commented 7 years ago

couldn't sh:and be used to address this corner case?

simonstey commented 7 years ago

e.g.:

ex:ExampleShape
  a sh:NodeShape ;
  sh:and (
      [
        sh:path ex:p ;
        sh:minInclusive "2002-10-10T12:00:00-05:00"^^xsd:dateTime ;
      ]
      [
        sh:path ex:p ;
        sh:minInclusive "2002-10-10T12:00:00"^^xsd:dateTime .
      ]
   ) .
pfps commented 7 years ago

sh:and can be used to state any implicit conjuction.

The overall problem is that the SHACL Core syntax is so non-uniform, with no reason for whether a particular shape is legal or illegal. This then combines with the lack of required syntax checking to make it very hard for users to determine whether their shapes will work the same in any other SHACL implementation.

TallTed commented 7 years ago

Using the SHACL-for-SHACL to validate one's shapes is a good way to check whether those shapes conform to the spec. This does not need to be run every time the shapes graph is used, just as the W3 Validator (or similar) doesn't need to be run against an HTML document by every browser before it is rendered.

The spec states clearly, as the WG concluded, that only one value is allowed for each of these several predicates. Any shape that has multiple values for the listed predicates is "illegal" to use your terminology. The reason it is "illegal" is that it violates the spec, as that's the governing "law."

In situations such as you've described, one solution (of at least two, possibly many) is to use sh:and, as has been demonstrated.

I do not understand why you think there would be different results (validation reports) obtained from different SHACL processors. If you can put this concern more clearly, I would suggest raising it as a new issue, as I think this one has been answered.

irenetq commented 7 years ago

We will consider an issue addressed unless we hear back from the submitter within 5 days of the last WG response comment. Last WG response comment on this issue was 5 days ago. With this, we will give extra 3 days to respond - before we will consider the issue to be addressed and submitter assumed to be satisfied.

pfps commented 7 years ago

I am not satisfied by the current syntax of SHACL in many areas, and particular in this one. Excluding non-redundant sh:minInclusive constraints is just the worst part of the current syntax. There is no reason to exclude non-redundant sh:minInclusive constraints.

HolgerKnublauch commented 7 years ago

Of course there are reasons to exclude them. If tools know in advance that there can only one value they can for example optimize the user interface or other algorithms.

If this is "the worst part of the current syntax" for you, then you seem to be very happy otherwise. That's good to hear.

In 99.9% of the practical cases there will be only one sensible value. AFAIK only xsd:dateTime is an exception because it has this weird handling of time zones.

Overall it sounds like we have to agree to disagree here. Nobody in the WG is sharing your viewpoint on this issue.

pfps commented 7 years ago

Excluding syntax for no good reason whatever when similar syntax is not excluded is just about the worst syntax sin in my books.

irenetq commented 7 years ago

Well, there are good reasons. Or, at least, reasons other people find good, but you do not find them to be good enough. One of such reasons was explained in the previous comment. So, obviously, the "goodness" of the reason is a matter of opinion.

I notice that you don't seem to be consistent in your opinions:

This is strange and inconsistent. I believe that it would much more common/easy for users to create incorrect shapes by mistake and/or misinterpret such shapes in the situation we are discussing here (e.g., creating two minimums and not understanding which one applies), then it would be for them to throw in some random statements into a field that supposed to be a SHACL list and then think that this extra random statement is, in fact, the path.

simonstey commented 7 years ago

I believe that it would much more common/easy for users to create incorrect shapes by mistake and/or misinterpret such shapes in the situation we are discussing here (e.g., creating two minimums and not understanding which one applies), then it would be for them to throw in some random statements into a field that supposed to be a SHACL list and then think that this extra random statement is, in fact, the path.

and if someone knows what he's doing and wants to check for >1 minimums anyway (are there scenarios other than the xsd:dateTime/timezone one where this would make sense?), one can do it as pointed out in https://github.com/w3c/data-shapes/issues/68#issuecomment-297610695:

ex:ExampleShape
  a sh:NodeShape ;
  sh:and (
      [
        sh:path ex:p ;
        sh:minInclusive "2002-10-10T12:00:00-05:00"^^xsd:dateTime ;
      ]
      [
        sh:path ex:p ;
        sh:minInclusive "2002-10-10T12:00:00"^^xsd:dateTime .
      ]
   ) .
irenetq commented 7 years ago

Adding a link to the wiki page for the formal objection https://www.w3.org/2014/data-shapes/wiki/PRFO3