Open michaelhkay opened 1 year ago
The actual meaning in the XSLT 3.1 specification is that it matches any node
$N
that has an ancestor$A
such that the result of the XPath expression$A//(p except appendix//p)
includes$N
.
Isn't XSLT 3.0 the last official W3C XSLT specification? I searched for XSLT 3.1 but couldn't find such specification. Any links would be very useful.
Consider the pattern
p except appendix//p
. Anyone writing this probably imagines that this will match anyp
element that does not have anappendix
as an ancestor.
isn't this equivalent to the pattern:
p except ./appendix//p
If these are equivalent, then for the second one it is clear that it may match a p
(the left one in the pattern) that could itself have an ancestor appendix
.
Maybe the pattern should have been specified as:
p except //appendix/p
Probably a more specific example and explanation would be helpful.
If I get it right, the semantics of except
and intersect
differ between XSLT and XQuery.
If the behavior in XSLT is changed, does that mean that the rules of the two languages are harmonized? Why does the problem not affect the union
operator?
I think they have the same semantics in an XPath expression, it's specifically in the context of a match pattern where this problem arises.
XSLT 3.1
Sorry, typo. I meant 3.0.
isn't this equivalent to the pattern
p except ./appendix//p
I don't think that's actually a legal pattern.
Maybe the pattern should have been specified as:
p except //appendix/p
Yes, that pattern would (I think) have the desired effect (provided the relevant tree is rooted at a document node).
Why does the problem not affect the union operator?
Very good question, and I wish I knew how to do the proof. I have managed to persuade myself that the combination of /
and union
is distributive: X / (Y union Z)
is equivalent to (X / Y) union (X / Z)
for all expressions X, Y, and Z; while the same is not true for the except and intersect operators. It's easy to show by counter-example that the equivalence doesn't apply for intersect and except, but I have never found a satisfactory proof that it does apply for union.
This is relevant because we're interpreting the pattern Y except Z
as matching a node N by evaluating the expression A/(Y except Z)
for some ancestor A, and this doesn't give the same result as evaluating A/Y except A/Z
.
See also my blog article at https://blog.saxonica.com/mike/2022/05/except-patterns.html
This proposes that we create new operators and-also
and but-not
with the "intuitive" semantics, and deprecate intersect/except at the top level (noting however, that common usages like match="@* except @code"
are not troublesome).
I can see the problem, but for my own coding I have learned that desired exceptions in a match pattern should motivate the creation of another template with higher priority to overshadow the one I'm currently writing. In other words, I have personally deprecated the except
keyword in XSLT template matches. Not saying that should speak to any particular decision.
As for and-also
/intersect
, isn't that what a predicate expression is for?
I would like to propose making an incompatible change to the semantics of XSLT patterns using the "except" and "intersect" operators, so that they have their intuitive meaning.
Consider the pattern
p except appendix//p
. Anyone writing this probably imagines that this will match anyp
element that does not have anappendix
as an ancestor. The intuitive meaning ofA except B
is to match anything that matchesA
unless it also matchesB
.The actual meaning in the XSLT 3.1 specification is that it matches any node
$N
that has an ancestor$A
such that the result of the XPath expression$A//(p except appendix//p)
includes$N
.Consider the XML
The
<p>
element here has an ancestor (the<div>
element) where the result of$A//(p except appendix//p)
includes the<p>
element. So despite having an ancestorappendix
this element matches the patternp except appendix//p
. This is not only a counter-intuitive result, it also makes such patterns useless in practice.Patterns using
intersect
suffer the same problem, though it is much harder to construct a plausible example.Patterns that only use the child or attribute axis, for example
@* except @code
, or* except note
, don't suffer from this problem and will retain the same meaning as in 3.1.The required effect can be achieved by writing
p except p[ancestor::appendix]
. Because the patternp[ancestor::appendix]
is equivalent toappendix//p
, people are very likely to imagine thatp except p[ancestor::appendix]
is equivalent top except appendix//p
.Making any incompatible change to the language semantics should be done only with a very strong justification, but I believe that it is justified in this instance. The existing semantics are not only counter-intuitive, they are also sufficiently useless that it is extremely unlikely anyone has existing working code, other than artificial test cases, that relies on the current semantics.