qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

XSLT patterns: intersect and except #402

Open michaelhkay opened 1 year ago

michaelhkay commented 1 year ago

I would like to propose making an incompatible change to the semantics of XSLT patterns using the "except" and "intersect" operators, so that they have their intuitive meaning.

Consider the pattern p except appendix//p. Anyone writing this probably imagines that this will match any p element that does not have an appendix as an ancestor. The intuitive meaning of A except B is to match anything that matchesA unless it also matches B.

The actual meaning in the XSLT 3.1 specification is that it matches any node $N that has an ancestor $A such that the result of the XPath expression $A//(p except appendix//p) includes $N.

Consider the XML

<appendix>
  <div>
     <p>...</p>
  </div>
</appendix>

The <p> element here has an ancestor (the <div> element) where the result of $A//(p except appendix//p) includes the <p> element. So despite having an ancestor appendix this element matches the pattern p except appendix//p. This is not only a counter-intuitive result, it also makes such patterns useless in practice.

Patterns using intersect suffer the same problem, though it is much harder to construct a plausible example.

Patterns that only use the child or attribute axis, for example @* except @code, or * except note, don't suffer from this problem and will retain the same meaning as in 3.1.

The required effect can be achieved by writing p except p[ancestor::appendix]. Because the pattern p[ancestor::appendix] is equivalent to appendix//p, people are very likely to imagine that p except p[ancestor::appendix] is equivalent to p except appendix//p.

Making any incompatible change to the language semantics should be done only with a very strong justification, but I believe that it is justified in this instance. The existing semantics are not only counter-intuitive, they are also sufficiently useless that it is extremely unlikely anyone has existing working code, other than artificial test cases, that relies on the current semantics.

dnovatchev commented 1 year ago

The actual meaning in the XSLT 3.1 specification is that it matches any node $N that has an ancestor $A such that the result of the XPath expression $A//(p except appendix//p) includes $N.

Isn't XSLT 3.0 the last official W3C XSLT specification? I searched for XSLT 3.1 but couldn't find such specification. Any links would be very useful.

Consider the pattern p except appendix//p. Anyone writing this probably imagines that this will match any p element that does not have an appendix as an ancestor.

isn't this equivalent to the pattern:

p except ./appendix//p

If these are equivalent, then for the second one it is clear that it may match a p (the left one in the pattern) that could itself have an ancestor appendix.

Maybe the pattern should have been specified as:

p except //appendix/p

Probably a more specific example and explanation would be helpful.

ChristianGruen commented 1 year ago

If I get it right, the semantics of except and intersect differ between XSLT and XQuery.

If the behavior in XSLT is changed, does that mean that the rules of the two languages are harmonized? Why does the problem not affect the union operator?

ndw commented 1 year ago

I think they have the same semantics in an XPath expression, it's specifically in the context of a match pattern where this problem arises.

michaelhkay commented 1 year ago

XSLT 3.1

Sorry, typo. I meant 3.0.

isn't this equivalent to the pattern p except ./appendix//p

I don't think that's actually a legal pattern.

Maybe the pattern should have been specified as: p except //appendix/p

Yes, that pattern would (I think) have the desired effect (provided the relevant tree is rooted at a document node).

Why does the problem not affect the union operator?

Very good question, and I wish I knew how to do the proof. I have managed to persuade myself that the combination of / and union is distributive: X / (Y union Z) is equivalent to (X / Y) union (X / Z) for all expressions X, Y, and Z; while the same is not true for the except and intersect operators. It's easy to show by counter-example that the equivalence doesn't apply for intersect and except, but I have never found a satisfactory proof that it does apply for union.

This is relevant because we're interpreting the pattern Y except Z as matching a node N by evaluating the expression A/(Y except Z) for some ancestor A, and this doesn't give the same result as evaluating A/Y except A/Z.

michaelhkay commented 1 year ago

See also my blog article at https://blog.saxonica.com/mike/2022/05/except-patterns.html

This proposes that we create new operators and-also and but-not with the "intuitive" semantics, and deprecate intersect/except at the top level (noting however, that common usages like match="@* except @code" are not troublesome).

Arithmeticus commented 1 year ago

I can see the problem, but for my own coding I have learned that desired exceptions in a match pattern should motivate the creation of another template with higher priority to overshadow the one I'm currently writing. In other words, I have personally deprecated the except keyword in XSLT template matches. Not saying that should speak to any particular decision.

As for and-also/intersect, isn't that what a predicate expression is for?