qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
27 stars 15 forks source link

Local functions in XSLT #735

Open michaelhkay opened 9 months ago

michaelhkay commented 9 months ago

I propose that we should add local functions to XSLT: specifically, allowing an xsl:function declaration to appear within a sequence constructor, declaring a named function that is available for use only within the sequence constructor.

At present this can be achieved by declaring a local variable bound to an anonymous function, but it's clumsy to have to use completely different syntax for local and global functions, and functions defined in this way cannot be mutually recursive.

I propose that such functions should shadow any global functions with the same name, in the same way as happens with local variable declarations. I have an open mind as to whether shadowing of functions in reserved namespaces should be allowed.

The main difficulty is the scoping rules. We don't want the problems Javascript has with "hoisting". I propose that (a) all local function declarations must appear before any instructions (or local variable declarations, but not params) within the sequence constructor, and (b) these function declarations are in-scope throughout the sequence constructor including forwards references from the body of other functions declared earlier within the same sequence constructor.

ndw commented 9 months ago

Is this something you've actually wanted to do? I can imagine why you might, but I don't think I ever have. Or maybe I'm just ok with private functions declared outside the sequence constructor.

Requiring them to come before variable declarations and other instructions seems sensible.

michaelhkay commented 9 months ago

I've found myself either (a) writing a global function declaration that I would prefer to be local, or (b) writing an anonymous function and binding it to a variable; both solutions work, but I've felt that it locally-declared XSLT functions would be cleaner.

It's a feature I still find myself wanting in Java. They're available in Pascal, C#, Javascript, etc etc.

dnovatchev commented 9 months ago

At present this can be achieved by declaring a local variable bound to an anonymous function, but it's clumsy to have to use completely different syntax for local and global functions, and functions defined in this way cannot be mutually recursive.

This is not true!

Here is an example in pure XPath 3.1, solving the problem specified here: https://en.wikipedia.org/wiki/Mutual_recursion#Basic_examples

This expression:

let $isEvenInner := function($N, $self, $isOddInner)
    {
       if($N eq 0) then true()
         else $isOddInner($N -1, $isOddInner, $self)
    },

   $isOddInner := function($N, $self, $isEvenInner)
    {
       if($N eq 0) then false()
         else $isEvenInner($N -1, $isEvenInner, $self)  
    },

    $isEven := function($N)
    {
      $isEvenInner($N, $isEvenInner, $isOddInner)
    },

    $isOdd := function($N)
    {
      $isOddInner($N, $isOddInner, $isEvenInner)
    }    
 return
   ($isEven(255), $isOdd(255) )

produces the correct result: false(), true()

image

About the so called "clumsiness":

When the user consumes the two functions from an XPath function library, they don't know anything about the internals of the functions, and just use the shortest and most readable expression they need:

($isEven(255), $isOdd(255) )

ndw commented 9 months ago

Your solution is very clever, but I'm not confident that most users are going to think of using a pair of HOF in addition to the two functions desired to accomplish what would be utterly straightforward with two named functions.

There's a difference between "it is actually possible to do this thing if you understand precisely how to arrange all of the pieces" and "it's practical to do this thing using mechanisms that will feel familiar".

dnovatchev commented 9 months ago

Your solution is very clever, but I'm not confident that most users are going to think of using a pair of HOF in addition to the two functions desired to accomplish what would be utterly straightforward with two named functions.

@ndw,

To have a way to do something now can sometimes be a matter of life or death.

As the proverb says: "Give a man a fish and you feed him for a day. Teach him how to fish and you feed him for a lifetime"​.

Imagine that you could teach the man how to catch a fish and save him from starvation, or tell him condescendingly: "This is way too-complicated for you - wait for a fancy service (that is just being contemplated) to arrive in the next N years"

In the case of our technologies, N is 6-8 years and can be even longer than 10 years,

Instead of leaving the man to years of starvation, it is our duty to give him the immediate knowledge to live in a more human way immediately-now.

This is not just theoretical (fancy talk). Remember FXSL, which was there for anyone even in 2002. The "official solution" for HOFs in XPath only arrived in 2014. Not giving people FXSL would leave them hungry (in the dark) for a l o o o n g 12 years.

One could still argue that only a few would be able to use the FXSL library. However the facts tell us otherwise: just the FXSL page was visited 308K (thousand) times and was downloaded many tens of thousands of times.

There's a difference between "it is actually possible to do this thing if you understand precisely how to arrange all of the pieces" and "it's practical to do this thing using mechanisms that will feel familiar".

Yes, and the difference is between giving people light now or leaving them in prolonged darkness and intellectual hunger.

pgfearo commented 9 months ago

It's just a thought, but perhaps, as a more flexible solution, we could consider restricting the scope of functions (and variables I guess) declared within a new dedicated instruction, perhaps named xsl:scope?

This would provide similar behaviour to other languages that allow the use of curly-brace pairs that have no other purpose other than to restrict the scope of what lies within. An example would be the JavaScript block statement.

The xsl:scope element could be a child of the root xsl:stylesheet element or within any sequence constructor. It must always contain at least least one sequence constructor.


On a side-note, it's quite commonplace to create functions from existing code using an 'extract to function' refactor command available in the XSLT editor. This could be extended to provide options to the user on how the function would be scoped.

michaelhkay commented 9 months ago

Within a sequence constructor, you can partition the scope by using xsl:sequence.

For global declarations, you can use packages and visibility.

michaelhkay commented 9 months ago

@dnovatchev It's good to know that there's a workaround like this for mutually-recursive local functions, but it's very much a workaround; and it doesn't address the point that it's desirable to be able to define local functions and global functions using the same syntax.

dnovatchev commented 9 months ago

@dnovatchev It's good to know that there's a workaround like this for mutually-recursive local functions, but it's very much a workaround; and it doesn't address the point that it's desirable to be able to define local functions and global functions using the same syntax.

@michaelhkay This is not a "workaround' but this is a straightforward solution, that is available now / immediately for anyone needing it.

Another lesson to learn is that many absolute statements like "\<Something> is impossible" often end up to be false, and the opposite is true.

dnovatchev commented 8 months ago

@dnovatchev It's good to know that there's a workaround like this for mutually-recursive local functions, but it's very much a workaround; and it doesn't address the point that it's desirable to be able to define local functions and global functions using the same syntax.

@michaelhkay In XSLT 3.0 it is currently possible to define both local and global functions using the same syntax - here is a minimal example:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:variable name="vGlobal" select="function($n) { 2 * $n }"/>

  <xsl:template match="/">
    <xsl:variable name="vLocal" select="function($n) { 2 * $n }"/>
    <xsl:sequence select="$vGlobal(5), $vLocal(5)"/>
  </xsl:template>
</xsl:stylesheet>

Anyone can run this on a dummy/not-used xml document, as I did with Saxon 12.3HE to get the expected, correct result:

(10, 10) :

image

Why add even more syntactic and semantic luggage to the language, when people actually do have the capability of having local functions in the current, official version of XSLT 3.0?

And as an aside note: I personally would recommend to restrain from using global/static-scope variables, as this is considered in most cases to be an anti-pattern.

To be clear: I am not against this proposal, but simply don't understand why there is a need for such a feature that duplicates what is already a current capability of XSLT 3.0.

michaelhkay commented 8 months ago

I think I have already explained that I see two main benefits:

(a) you can use exactly the same syntax for local functions as for global functions (that is, the xsl:function syntax which pretty well all XSLT users use in preference to writing the function as an anonymous function bound to a variable) (b) you don't have to use messy workarounds (like defining additional helper functions) to achieve recursion.

dnovatchev commented 8 months ago

I think I have already explained that I see two main benefits:

Yes, the initial proposal was justified by what was stated as a fact: (a) that " it's clumsy to have to use completely different syntax for local and global functions" and (b) that "functions defined in this way cannot be mutually recursive."

In this discussion it was shown that both these statements were not true:

(a) you can use exactly the same syntax for local functions as for global functions (that is, the xsl:function syntax which pretty well all XSLT users use in preference to writing the function as an anonymous function bound to a variable)

(a) it was shown how the same syntax is used to define both global and local functions.

(b) you don't have to use messy workarounds (like defining additional helper functions) to achieve recursion.

(b) a specific example was provided of mutual recursion with such functions, contrary to the claim that this cannot be done.

So, the proposed new syntax and semantics for local functions now seems to be based more on personal preferences rather than on the necessity for solving an unsolvable problem (as was suggested for the problem of expressing mutual recursion, but as demonstrated such solution exists even in XSLT 3.0)

Once again, I am not opposed to this, however the facts provided in this discussion seem to de-emphasize the immediate need for this.

Add to this the well-explained new complexity that will be added - scoping rules, forwards-references, etc...:

The main difficulty is the scoping rules. We don't want the problems Javascript has with "hoisting". I propose that (a) all local function declarations must appear before any instructions (or local variable declarations, but not params) within the sequence constructor, and (b) these function declarations are in-scope throughout the sequence constructor including forwards references from the body of other functions declared earlier within the same sequence constructor.

Isn't a language that is already quite complex going to become even more complicated, and is such further complication desperately needed?

michaelhkay commented 8 months ago

See also issue #745 which takes a slightly different approach - it makes the locally-defined functions anonymous.

MarkNicholls commented 8 months ago

my comments on #745 are really about this proposal.

745 allows this

    <xsl:variable name="vLocal" select="function($n) { 2 * $n }"/>

to be expressed by this

      <xsl:variable name="vLocal">
         <xsl:function>
            <xsl:param name="n"/>
             <xsl:sequence select="2*$n"/>
          </xsl:function>
     </xsl:variable>

I'm also not a fan of strange scoping rules, I don't see functions as particularly special in any scope.

To solve the recursive issue, I'd introduce a new attribute 'rec' which is by default true for function and false for variables.

So for example using an anonymous function

   <xsl:template ...>
      <xsl:variable name="factorial" rec="true">
         <xsl:function>
            <xsl:param name="n"/>
            <xsl:choose>
               <xsl:when test="$n > 1">
                  <xsl:sequence select="$n * factorial($n - 1)"/>
               </xsl:when>
               <xsl:otherwise>
                  <xsl:sequence select="$n"/>
               </xsl:otherwise>
            </xsl:choose>
         </xsl:function>
      </xsl:variable>
      <xsl:sequence select="$factorial(5)"/>
   </xsl:template>

which could be written as (with no special scoping rules..it IS a variable)

   <xsl:template ...="">
      <xsl:function name="factorial">
         <xsl:param name="n"/>
         <xsl:choose>
            <xsl:when test="$n > 1">
               <xsl:sequence select="$n * factorial($n - 1)"/>
            </xsl:when>
            <xsl:otherwise>
               <xsl:sequence select="$n"/>
            </xsl:otherwise>
         </xsl:choose>
      </xsl:function>
      <xsl:sequence select="$factorial(5)"/>
   </xsl:template>

as for the corecursive thing (I don't really understand the xpath example above), but I could potentially now write it like this

   <xsl:template...>
      <xsl:variable name="isEvenOdd" rec="true">
         <xsl:function name="isEven" rec="false">
            <xsl:param name="n"/>
            <xsl:choose>
               <xsl:when test="$n = 0">
                  <xsl:sequence select="true()"/>
               </xsl:when>
            </xsl:choose>
            <xsl:otherwise>
               <xsl:sequence select="$isEvenOdd[2]($n - 1)"/>
            </xsl:otherwise>
         </xsl:function>
         <xsl:function name="isOdd" rec="false">
            <xsl:param name="n"/>
            <xsl:choose>
               <xsl:when test="n = 0">
                  <xsl:sequence select="false()"/>
               </xsl:when>
            </xsl:choose>
            <xsl:otherwise>
               <xsl:sequence select="$isEvenOdd[1]($n - 1)"/>
            </xsl:otherwise>
         </xsl:function>
         <xsl:sequence select="($isEven#1,$isOdd#1)"/>         
      </xsl:variable>
      <xsl:value-of select="$isEvenOdd[1](5)"/>
   </xsl:template>

i.e. declare both functions in the context of a recursive variable (you don't need to name the functions inside this variable or the 'rec' attributes, they could be anonymous, but just for clarity).

this would

the only wrinkle for me is that the scoping rules at the global level are different than local...but tbh, they are already.

P.S. you can sidestep the 'rec' issue by introducing objects (which are inherently recursive fixed points (?)) and using 'this' as the recursive variable instead of isEvenOdd...personally I prefer to enhance and expose the atomic pieces of the language rather than introduce a heavyweight concept like objects see #720, but "objects" may well be easier for most programmers to understand (ironically)

michaelhkay commented 8 months ago

The trouble if you go down this route is that you get into all the problems Javascript has with "hoisting", where you have a construct like

{ function f() {2*h()}
  var g;
  function h() {2*g)}
}

where you can call a function whose closure contains variables that haven't yet been evaluated.

We could do either or both of:

(a) defining a syntax that allows XSLT named functions definitions with local scope, where the scope rules allow recursion

(b) defining a syntax that allows XSLT to construct anonymous function items and bind them to variables; the scoping rules for variables make recursion difficult in this scenario.

But mixing the two capabilities by fuzzying the scope rules for variables seems to be a recipe for the kind of problems Javascript has got into.

MarkNicholls commented 8 months ago

I don't do javascript...so I'm a bit lost.

when you say evaluated...do you mean declared? I wouldn't allow local functions to reference variables that aren't in scope, i.e. for local functions that aren't declared before or inside the local function (or globally).

In my suggested version you local functions are really just variables with anonymous functions in them, thus they follow the normal rules of local variables and you can't ever reference something declared after it, and they can only reference themselves if they or 'rec'.

a) 'rec' (or #720) b) I've demonstrated this with a 'rec' variable (or #720 does it)

confused