qt4cg / qtspecs

QT4 specifications

https://qt4cg.org/

Other

28 stars 15 forks source link

Support unbounded variadic functions on sequence parameters #161

Closed rhdunn closed 6 months ago

rhdunn commented 2 years ago

Where a parameter is defined as a 0-or-more sequence (T*) or a 1-or-more sequence (T+) and that parameter is indicated as behaving variadically, then positional (non-keyword) arguments from that parameter onward are bound to that sequence.

Motivation

The XQFO specification supports the fn:concat function which takes 2 or more arguments. The "or more" part is defined as "..." with a short sentence stating that this is the only function that supports two or more arguments. As such, the behaviour of this function is loosely defined.

Other implementors and specifications (BaseX, EXQuery RESTXQ, MarkLogic) make use of these variadic sequence types in several functions.

In all of these cases (and for user-defined variadic sequence parameters) the semantics of the functions should be well defined.

Note

In the case where the 0-or-more sequence is the last parameter (or the parameter receiving the unbounded argument values), that parameter would be an optional value as if it was defined as $param as T* := ().

One possibility with the interaction with named arguments is to allow that to break up two sequence parameters that behave variadically. For example, given f($a as xs:int*, $b as xs:string, $c as xs:int*) it may be possible to call it like f(1, 2, 3, b: "4", 5, 6, 7) which would be equivalent to f((1, 2, 3), "4", (5, 6, 7)) currently.

This should be defined such that calling the function with the same number of arguments as the function has parameters then none of the parameters behave variadically. This has the intention of not breaking existing code.

dnovatchev commented 2 years ago

+1, with the following observation and suggestion:

Observation: Using a sequence to pass an unlimited number of arguments doesn't work when one or more of the values are the empty sequence. Also this doesn't work when one or more of the arguments` value may be a sequence itself.

Solution: This can easily be fixed by passing the unbounded argument values in an array.

michaelhkay commented 2 years ago

I believe that once we have defined the proposed specification for optional and keyword parameters, "sequence variadic" functions following the pattern of concat() can be defined by a fairly easy extension as follows:

First, we extend the syntax of parameters in a function declaration to indicate that they are repeated. Perhaps "..." after the parameter name in XQuery, xsl:param/@repeated=yes in XSLT. Then we define that this is equivalent to an unbounded sequence of optional parameter declarations, so

$arg... as xs:string* := ()

is defined to expand to

$arg1 as xs:string* := (),
$arg2 as xs:string* := (),
$arg3 as xs:string* := (),
$arg4 as xs:string* := (),
...

The top end of the "arity range" of the "declared function" becomes unbounded.

No other changes are needed.

Note that this formulation allows concat() and other functions using the same pattern to define their arguments in any number of ways:

concat("a", "b", "c")
concat(("a", "b", "c"))
concat(("a", "b"), "c")
concat(arg1:="a", arg2:="b", arg3:="c")
concat(arg22:="c", arg15:="b", arg3:="a")
concat(arg1:=("a", "b"), arg2:="c")

All these calls produce the result "abc".

rhdunn commented 2 years ago

That syntax should work. One thing to note is that the ... requires a whitespace due to NCNames allowing . in them, so the "Terminal Delimitation" section should be updated accordingly.

dnovatchev commented 2 years ago

@michaelhkay , @rhdunn :

I believe that once we have defined the proposed specification for optional and keyword parameters, "sequence variadic" functions following the pattern of concat() can be defined by a fairly easy extension as follows:

No, not in general!

Any function that doesn't ignore an argument with value the empty sequence (or in fact any other sequence) will not get the value ( () or or the sequence) specified for this argument.

We need a mechanism to represent the argument values in a lossless way, but using a sequence for this:

Loses any values that are the empty sequence, thus if we passed N arguments, 3 of which have the value () , the function will receive a sequence of N - 3 values
Cannot have subsequences, so if we passed N arguments, one of which has the value (1, 2, 3), the function will receive N+2 arguments,

I have been mentioning and repeating this issue many times and no one is paying attention!

The proposed solution of passing the unbounded arguments in sequence has this irreparable flaw and we should clearly see the problem.

There is at least one solution to this problem, that is very simple and obvious: Pass each of the unbounded arguments as a member of an array.

Thus this code:

let $ar := [1, 2, (), 3]
 return
  for $i in 1 to array:size($ar)
    return (concat($i, ': ',   $ar($i), '&#xA;'))

correctly produces:

and this:

let $ar := [1, 2, (4, 5, 6), 3],  
    $size := array:size($ar) 
 return
   for $i in 1 to $size
     return ('==============', $ar($i))

correctly produces:

==============
1
==============
2
==============
4
5
6
==============
3

Thanks, Dimitre

michaelhkay commented 2 years ago

I made it clear that my proposal provides a solution for functions "following the pattern of concat()", where empty sequences can safely be ignored because they contribute nothing to the result, and where f(x, y, z) is expected to produce the same result as f((x, y, z)). Most of the functions I've seen that are candidates for "unbounded variadic" follow this pattern.

My main reason for wanting to provide this capability at all is so that fn:concat() no longer has to be treated as a special case. It doesn't concern me greatly if people wanting something more complex than this have to pass an array as an argument.

dnovatchev commented 2 years ago

My main reason for wanting to provide this capability at all is so that fn:concat() no longer has to be treated as a special case

So do everything for the case of a single function, and ignore multitude of cases for functions some of whose arguments may be not just a single item?

Such an approach seems really subjective, biased and unfounded.

When at last will we admit that the flat sequence is not the best container for unbounded arguments of any type?

The people who added the array type to the XPath DM did this for a purpose: not just to mimic the arrays from JSON, but to provide a data type that doesn't suffer from the flaws of the flat sequence and is a good container for anything..

When will we start properly using the better data structure?

rhdunn commented 2 years ago

The existing APIs (fn:concat and others) all use sequence types, and sequence types are the most common type in XPath/XQuery. As such, supporting that use case here (and getting rid of the special case logic) makes sense.

I can see how supporting arrays would be useful, but that shouldn't come at the cost of not supporting sequences as both types are useful in different contexts.

With the proposal Michael has sugested, I can see $arg ... as array(*) and $arg ... as array(*) := array {} working for arrays, where argN will bind to the Nth array value. The tricky case is where you don't specify all values, e.g. only arg3 : 6 -- in that case, it could default the other unspecified values as empty sequences, giving in this case array { (), (), 6 }. Note: that (with maps binding to named/keyword arguments) would make this feature available across all the types and it is then up to the function/API designer whether or not they want the sequence or array variant.

I think it makes sense to restrict this to the sequence case and have a separate issue for the array case, like how support for maps is a separate proposal.

dnovatchev commented 2 years ago

The existing APIs (fn:concat and others) all use sequence types, and sequence types are the most common type in XPath/XQuery.

This proposal is not just to cover the existing/standard functions in the FO specification, but any functions the users write, or already have, or will write in the future.

In many other languages there is no restriction about the types of the variadic arguments, thus their users don't suffer from any such artificial problems that the current proposal has. Take for example C#. Here we can pass any object, of any type as the value of the variadic parameter, including null and collections:

using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        Variadic("String1", null, new List<int> { 1, 2, 3 });
    }

    static void Variadic(params object[] args)
    {
        foreach (var arg in args)
        {
            if (arg == null)
                Console.WriteLine("null");
            else
            {
                var isEnumerable = (arg is IEnumerable<int>);
                if (!isEnumerable)
                    Console.WriteLine(arg);
                else
                    Console.WriteLine(string.Join(", ", ((IEnumerable<int>)arg).Select(v => v)));
            }
        }
    }
}

All variadic parameters are contained in an array, and can have any type and value, including null and collections.

Running the above correctly produces:

String1
null
1, 2, 3

ChristianGruen commented 2 years ago

There are good reasons to complain about sequences. Personally, I’ve learned to love them. I’m happy about implicit flattening; I appreciate the absence of null references.

I believe we should trust the power of sequences and take full advantage of them whenever possible. There are good reasons to also support arrays and maps, but they will always be second-class citizens in XPath, and we should avoid simulating other languages that have a different history.

dnovatchev commented 2 years ago

and sequence types are the most common type in XPath/XQuery

Really?

How do we know what code any XPath developer has written / is writing / intends to write ? Restricting them to not being able to pass any types of values, including sequences and the empty sequence, affects their creativity, productivity and even their ways of thinking. For me the latter is especially bad.

Our users will suffer because of such subjective statements.

I can see how supporting arrays would be useful, but that shouldn't come at the cost of not supporting sequences as both types are useful in different contexts.

We need to see an example how containing all variadic arguments in an array is "not supporting sequences".

Any function that needs a sequence of variadic arguments gets it from the array, as in this example:

let $f := function($arg1 as item()*, $arg2 as item()*)
          { array:flatten([$arg1, $arg2]) (:  Do whatever necessary with this sequence :) }
 return
   $f("Good ", ["morning", ["friends"]])

and this correctly produces the sequence:

Good 
morning
friends

dnovatchev commented 2 years ago

I made it clear that my proposal provides a solution for functions "following the pattern of concat()", where empty sequences can safely be ignored because they contribute nothing to the result, and where f(x, y, z) is expected to produce the same result as f((x, y, z)).

Ironically, with fn:concat() this produces an error:

concat((1,2,3))

"Cannot find a 1-argument function named Q{http://www.w3.org/2005/xpath-functions}concat(). The namespace URI and local name are recognized, but the number of arguments is wrong"

liamquin commented 2 years ago

There are multiple kinds of variadic functions in different programming languages. For example, C gives us the idea of declaring a number (possibly zero although in some implementations at least one, i think) of required positional arguments followed by a single argument that's used as a handle into what's effectively an array. This lets you implement printf-like functions - printf("%s, %d", name, number), as well as XView-style null-terminated property lists,

Window(owner, W_WDTH, 24, W_HEIGHT, 34, W_TITLE, "Hello", NULL).

If there can be at most one repeated parameter, and others simply take sequences, doesn't that satisfy both concat() and most other needs? It also allows the empty sequence to appear anywhere. The arguments could be accessed by a function,

function concatenate($strings as xs:string? *) as xs:string {
  let $values := get-args($strings)
  return
  string-join(for $i in 1 to array:length($values) return $values($i), '')
}

concatenate(a, b, (), c)

This allows empty sequences as parameters, does not create implicit variable names like $arg4, and is simple to explain. If there's a use case for extending ohter than at the end, a * or + syntax would work but maybe keyword arguments could be required in that case to show the start - concatenate(a: 1, 2, 3, b: 4, 5, 6, 7) or something.

dnovatchev commented 2 years ago

As per @michaelhkay :

Note that this formulation allows concat() and other functions using the same pattern to define their arguments in any number of ways:
concat("a", "b", "c")
**> concat(("a", "b", "c"))**
**> concat(("a", "b"), "c")**
concat(arg1:="a", arg2:="b", arg3:="c")
concat(arg22:="c", arg15:="b", arg3:="a")
concat(arg1:=("a", "b"), arg2:="c")
All these calls produce the result "abc".

Not true. concat(("a", "b", "c")) produces an error:

So does concat(("a", "b"), "c") (produces an error again):

Let us finally stop being obsessed with the weirdest beast in the zoo (fn:concat()) and provide the best variadic functionality for all XPath users - one that allows the value of a variadic argument to be the empty sequence, or any sequence, one that doesn't lose empty-sequence values and doesn't convert a single argument value, that is a sequence of N items, into N arguments' values.

michaelhkay commented 2 years ago

All these calls produce the result "abc".

I meant, of course, that this is what I expected the behaviour to be if sequence-variadic functions were introduced in 4.0, not the behaviour under the existing 3.1 specification. Sorry for not making this clear.

dnovatchev commented 2 years ago

All these calls produce the result "abc".

I meant, of course, that this is what I expected the behaviour to be if sequence-variadic functions were introduced in 4.0, not the behaviour under the existing 3.1 specification. Sorry for not making this clear.

@michaelhkay ,

I am sorry for not grasping this at first glance :(

Still, this seems quite underspecified.

For example, for this function call:

concat(("a", "b"), "c")

What would be passed as argument to the implementation of the function?

If it is a single sequence: "a", "b", "c", then how would the function implementation know that these actually represent 2 arguments, the 1st: "a", "b" , and the 2nd: "c" ?

michaelhkay commented 2 years ago

Note that in https://github.com/qt4cg/qtspecs/issues/161#issuecomment-1264289628, when I said "No other changes are needed", this is not correct, because I failed to discuss how an implementation of a function whose last argument is declared with "..." would reference the value of the value(s) assigned to that parameter.

I assumed, but did not actually say, that the values would be concatenated into a single sequence, so that in a function declared as

declared function f:product($input ... as xs:numeric*) {...}

and called as f:product(1,2,3), the function implementation would be presented with a variable $input set to the sequence (1,2,3), and therefore as far as the function implementation is concerned, f:product(1,2,3) would be indistinguishable from f:product((1,2,3)).

But this is not actually a necessary consequence of what I wrote. The values could equally well be presented to the function implementation as an array, which would allow the function implementation, if it chose, to produce a different result for f:product(1,2,3) and f:product((1,2,3)). A function that chose to treat the two calls as equivalent could still do so.

I have an open mind on whether this is a good idea: by and large, I think I would want to design variadic functions in which x(a,b,c) and x((a,b,c)) have the same effect, but if someone can produce a good use case for a function that doesn't follow this model, I could be persuaded.

dnovatchev commented 2 years ago

The values could equally well be presented to the function implementation as an array, which would allow the function implementation, if it chose, to produce a different result for f:product(1,2,3) and f:product((1,2,3)). A function that chose to treat the two calls as equivalent could still do so.

Yes, I believe this is a good progress and we are nearing common understanding and agreement here.

And @rhdunn wanted to extend this idea even further: If the values are presented as a map, then the function call can reference each variadic argument by (standard) name individually, such as:

myFun($arg999 := 33, $arg2 :=2)

Compared to presenting the variadic arguments with an array, this has the benefit that the map doesn't need to store any other intemediate arg -values (assuming they all have a default, say 0), because a map corresponds to a sparse array. But in the case where the arg-values are presented as array, then this array must have 999 members, which would be wasting space.

dnovatchev commented 2 years ago

@michaelhkay ,

I have an open mind on whether this is a good idea: by and large, I think I would want to design variadic functions in which x(a,b,c) and x((a,b,c)) have the same effect, but if someone can produce a good use case for a function that doesn't follow this model, I could be persuaded.

The first example that came to my mind (and there exist infinite number of similar problems) is:

A function averageForPeriod is declared like this:

let $averageForPeriod := function($dayMeasurements...  as xs:integer* = 0)
   {
       avg(array:for-each($dayMeasurements, avg#1))
    }

For additional clarity, let's think that for a number of days, each day's several temperature measurements are made and every argument to the function is the sequence of a such day-measurements for a specific day.

this function calculates an average temperature for each day, then it calculates the average of these days' averages.

Thus, $averageForPeriod((30, 15), 10) produces 16.25

but $averageForPeriod((30, 15, 10)) produces 18.333333333333333333

This can be verified with the following executable XPath 3.1 expression:

let $averageForPeriod := function($dayMeasurements  as array(xs:integer*))
   {
       avg(array:for-each($dayMeasurements, avg#1))
    }
 return
  ( 'Average for period (30, 15), 10 : ' || $averageForPeriod([(30, 15), 10]),
    'Average for period (30, 15, 10) : ' || $averageForPeriod([(30, 15, 10)]))

And, as requested, we can see that the above two function calls produce different results:

Average for period (30, 15), 10 : 16.25
Average for period (30, 15, 10) : 18.333333333333333333

dnovatchev commented 1 year ago

The existing APIs (fn:concat and others) all use sequence types, and sequence types are the most common type in XPath/XQuery. As such, supporting that use case here (and getting rid of the special case logic) makes sense.

@rhdunn Not true!

See for example this (really crucial) standard XPath 3.1 function: fn-apply().

Guess what is the type of its 2nd argument which is the container for the parameters? If you said: "array" - yes, you were right.

So the truth about the correct way to hold arguments for a dynamically to-be-executed function was known many years ago, and I am not its inventor...

I believe we should trust the power of sequences and take full advantage of them whenever possible. There are good reasons to also support arrays and maps, but they will always be second-class citizens in XPath, and we should avoid simulating other languages that have a different history.

@ChristianGruen Please, see above 👍

Let's not make a step-back from what has already been achieved in XPath 3.1 !!!!

michaelhkay commented 1 year ago

Here's a new attempt - it often helps to stand back, especially if the discussion has been heated.

We introduce XSLT and XQuery syntax to declare a function [declaration] as variadic. Perhaps %variadic as a function annotation or variadic="yes" on xsl:function, perhaps "..." on the last argument, it doesn't really matter.
If a function is variadic then it must not have any optional parameters, and a static function call must not use keyword arguments. (We might relax these constraints later, but for the moment, it keeps things simpler).
The arity range of a variadic function is from N-1 to positive infinity, where N is the number of declared parameters. A consequence of this rule is that you can't have two variadic functions with the same name, because their arity ranges would overlap.
The variable representing the last parameter is bound in the function body to an array, which contains the supplied argument values as its members.
In a static function call, the arguments must be supplied individually, they cannot be supplied collectively as an array. This is to avoid ambiguity in the case where f([x]) might either be supplying [x] as a singleton argument, or as a composite array wrapping a sequence of arguments.
A named function reference in the form f#N returns a function in which the arguments are supplied individually.
A named function reference in the form f#* returns a function in which the arguments are supplied collectively, as an array.

Example: let's consider a function that computes the product of a set of numbers (returning 1 if the set is empty).

It might be declared in XQuery as

declare function f:product($value ... as xs:double) {
   array:fold-left($value, 1, ->($x, $y){$x * $y})
}

The function can be called as f:product(2,3,4) which returns 24. The call f:product(2) returns 2, while f:product() returns 1.

let $f3 := f:product#3 returns an arity 3 function accepting 3 xs:double values, it might be called as $f3(2,3,4) returning 24.

let $fN := f:product#* returns an arity 1 function accepting an array of xs:double values, it might be called as $fN([2,3,4]) (though the real purpose would be to accept a variable-length array). To process a sequence of xs:double values, you would need to write $fN(array{$sequence}).

There is no direct way of supplying an array (or sequence) of values to a static function call, but writing

f:product#*(array{$sequence})

achieves the required effect.

The 3.1 specification of concat, generalised to accept 0 or 1 arguments as well as 2+, but still constrained to accept single strings rather than sequences of strings, could be written as

declare function fn:concat($value ... as xs:string?) {
    fn:string-join($value?*)
}

Existing calls (both static and dynamic) would continue to work.

dnovatchev commented 1 year ago

I read the "new attempt" again and it seems a good step forward.

I would like there to be a way to specify that there must be 1 or more variadic arguments provided (the current default is zero or more). Maybe something like (note the '+' in '...+'):

let $product := function($value ...+ as xs:double) {
   array:fold-left($value, 1, ->($x, $y){$x * $y})
}
  return
    (:  code using $product here  :)
    $product(2, 3, 4)

Also, '...' is not very visible and may require additional mental effort to understand (don't we also use "..." in other places, too?)

Why not have something similar to the C# params keyword syntax:

let $product := function($input params+ as xs:numeric)
                {  array:fold-left($input, 1, fn($x, $y){$x * $y}) }
 return
   $product(3,  5, 8)

If $product is defined as above, its arity is 1 to to positive infinity, and calling

$product()

will produce an error, and in many cases this is better than returning an artificial value.

Finally, we have already agreed that the variadic argument values will be held in an array, thus let us call this array-variadic and not ~~sequence-variadic~~, which is no longer truly describing this kind of variadicity.

As for :

5. In a static function call, the arguments must be supplied individually, they cannot be supplied collectively as an array. This is to avoid ambiguity in the case where f([x]) might either be supplying [x] as a singleton argument, or as a composite array wrapping a sequence of arguments.

I don't see this as a problem. If we want to pass a single argument, which is an array itself, we would write:

$f( [ [$x] ] )