qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Attribute priority for xsl:accumulator-rule #1224

Closed Arithmeticus closed 2 weeks ago

Arithmeticus commented 4 months ago

I propose that XSLT xsl:accumulator-rule be allowed to take attribute priority, to allow users to be more declarative in their accumulator rules. Even accumulators with two or three rules might require simple overshadowing: a default rule for the majority of nodes, with accommodation for certain exceptions. An explicitly declared priority rather than document order will allows users to better express their intentions, and processor-generated warnings about duplicate matches will be more meaningful.

Because the current rules stipulate that among multiple rules the last one in document order wins, I think that backward compatibility prevents us from using the default priority rules for templates (i.e., allotting -0.5, 0, 0.25 scores based on match pattern types). Rather, in this case, every accumulator rule is assumed to have priority 0, unless otherwise specified. If a node matches more than one rule of the same priority level, the last one wins. This simpler version of priority (assume zero, and if you know multiple matches will overlap, use @priority) is one that many developers have come to use for templates.

michaelhkay commented 4 months ago

If we do this then I suggest that if any rule has an explicit priority, then all rules for that accumulator must have an explicit priority, and the priorities must be distinct.

johnlumley commented 2 months ago

After today's brief discussion, requiring distinct priorities on all rules to me just implies a strict ordering of a set of rules (easy to implement - sort in ascending order and use the 'last one wins' as already implemented). With a large set of rules (say above a dozen or two) keeping these in order and distinct may prove irksome. [How many of us have found ourselves with priorities of 1.5, 1.55, 1.57, 1.6... 2.. ?]

But Joel has perhaps an easier notion - assume a default priority of 0 and then for implementation sort the set into 0-priority in document order followed by defined-priority (>0) in ascending priority.

This means that you could do some 'more specific' case editing, keeping some locality of concern, somewhat more smoothly, rather than having to set a priority for every rule, or edit-alter the order of possibly many rules, e.g.

accumulator-rule match="foo"
accumulator-rule match="foo[@foo]" priority="1"
accumulator-rule match="*[@foo]"

where we keep the foo rules together, but let more specific cases be indicated before following possibly matching, but considered less specific, rules.

Arithmeticus commented 2 months ago

Here is the use case that motivated me to start this issue. I have a simple accumulator. I need to declare default general behavior, but provide for the exceptions, like the following:

<xsl:accumulator name="test" initial-value="0">
      <xsl:accumulator-rule match="*" select="$value + 1"/>
      <xsl:accumulator-rule match="section" select="$value - 1"/>
</xsl:accumulator>

This triggers multiple XTDE0540 warnings. Perhaps I can squelch these warnings, but it might prevent me from identifying places where I do want the warnings. Perhaps I should have written for the first rule <xsl:accumulator-rule match="*[not(self::section)]" select="$value + 1"/> but that is a kludge and outside the spirit of XSLT. But sometimes I do this anyway, because I regularly ask myself, is it first one or the last one that wins, and if I do that, I know other people coming across my code will do the same (or be unaware that position matters).

As a developer, I would like to be to tell the processor, I know what I'm doing, and yes I really want one rule to overshadow the other.

One response might be to propose that warning XTDE0540 be dropped for accumulators. But I think that would obscure the opportunity we have to enable programmers to improve code readability and documentation. I regularly use @priority for my templates to make explicit my intentions (and to avoid the default template weighting rules, which, pace @cmsmcq, I find obfuscating). Here @priority seems the ideal answer to my use case.

johnlumley commented 2 months ago

I'm puzzled - the conflict resolution rules for accumulator rules (at least in XSLT3.0), clearly state that:

If there is a matching rule, then a new value is computed for the accumulator variable using the expression contained in that rule’s select attribute or the contained sequence constructor. If there is more than one matching rule, the last in document order is used. If there is no matching rule, the value of the accumulator variable does not change.

So in theory your accumulator should work as intended (decrement for select, increment otherwise). Moreover XTDE0540 errors shouldn't be being raised - they are from the conflict resolution policy for templates.

Is this a bug in the implementation? Or is your code more complex, perhaps using template application within the sequence constructor of an accumulator rule?

Arithmeticus commented 2 months ago

I was puzzled too. It happened in multiple environments, with very simple examples. But I didn't pursue that avenue and I became more interested in the question of enhancing accumulator rules with a better declarative mechanism.

michaelhkay commented 2 months ago

Saxon produces a warning, not an error, if there are two matching accumulator rules. Test case accumulator-081:

Warning at mode saxon:preDescent 
  XTDE0540  Ambiguous rule match for /element
Matches both "element" on line 14 of
  file:/Users/mike/GitHub/qt4cg/xslt40-test/tests/decl/accumulator/accumulator-081.xsl
and "element" on line 11 of
  file:/Users/mike/GitHub/qt4cg/xslt40-test/tests/decl/accumulator/accumulator-081.xsl

The spec doesn't require a warning here, but warnings are always permitted. It would be better if the warning were a bit clearer, for example referring to it as an accumulator rather than a mode (it gives away that the same code is reused internally).

johnlumley commented 2 months ago

The spec doesn't require a warning here, but warnings are always permitted.

Interesting - I've looked through the XSLT3 spec for every mention of warning and can't seem to find anything about warnings always being permitted... probably not looked deep enough.

But does that mean that in other places a processor can give a warning even if the code is perfectly valid? Of course in cases of implausible expressions such as @foo/bar we do have such, but in accumulator precedence?

I inadvertently used the term XTDE0540error rather than XTDE0540warning- sorry.

michaelhkay commented 2 months ago

The only conformance requirements are that a stylesheet produces the correct output. It can choose if it wishes to issue a running commentary on the quality of your code, or to play the soundtrack of Evita; that's completely outside the scope of the spec.

There are other cases where Saxon produces warnings, for example if you pass "true" as an argument to a function (when you probably intended "true()"). Whether we got the decision right in each individual case is a matter of opinion; generally warnings will be helpful to some users and annoying to others.

cmsmcq commented 2 months ago

Joel Kalvesmaki @.***> writes:

Here is the use case that motivated me to start this issue. I have a simple accumulator.

Thank you. I'm having a bit of trouble understanding why you're getting those errors, given the bits of the spec you and John Lumley have quoted. But I'll continue to think about it.

In the meantime, quite by accident, I have just come across a description of the compiler written by the computer scientist Corrado Böhm at ETH Zürich and described in his 1951 dissertation. A 1976 paper by Donald Knuth and Luis Trabb Pardo on "The early development of programming languages" [1] describes the compiler's handling of algebraic expressions this way:

Unlike Rutishauser, Böhm recognized operator precedence in his
language; for example, r:2+t was interpreted as (r:2)+t, the
division operator ":" taking precedence over addition.  However,
Böhm did not allow parentheses to be mixed with precedence
relations: if an expression began with a left parenthesis, the
expression had to be *fully* parenthesized even when associative
operators were present; on the other hand if an expression did *not*
begin with a left parenthesis, precedence was considered but no
parentheses were allowed within it.

[1] https://web.archive.org/web/20170912102014/http://bitsavers.org/pdf/stanford/cs_techReports/STAN-CS-76-562_EarlyDevelPgmgLang_Aug76.pdf

As can be seen, the authors make no value judgements. But my jaw dropped a bit, and my eyebrows rose a bit, when I read that description.

When I write things in formal languages, I hate keeping track of complicated levels of precedence (or over and? and over or? who can remember?!), so I use parentheses more heavily than most. But even I tend not to parenthesize every single subexpression (except of course in Lisp, where it has never bothered me) -- I suppose my rule, as a writer of expressions, is to parenthesize where it's made necessary either

The two cases are not directly parallel, but at some level, Böhm's approach is to say: either you accept the processor's default rules entirely, or you specify everything yourself, there is no middle path. I guess the reason it caught my eye just now is that it feels very parallel to the rule proposed for accumulator rules, and it seemed to make clearer to me why the proposed rule makes me uneasy. That approach feels so very different from how match patterns are handled that it looks like adding a stumbling block to the language.

Maybe design consistency is over-rated. But maybe not?

I hope we can find a good way to resolve this.

Michael

-- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com

michaelhkay commented 1 month ago

@cmsmcq Michael, if you have internet access up there, we will be so much poorer without your insightful contributions, of which this is probably one of the last examples.

michaelhkay commented 2 weeks ago

Closing this, because a PR was raised, discussed, and declined.