qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Allow function keyword inline functions without parameters #53

Closed rhdunn closed 1 year ago

rhdunn commented 3 years ago

The current draft InlineFunctionExpr adds -> as a shorthand. This shorthand allows optional parameter lists (e.g. -> { true() }), but the function keyword version of this requires a parameter list. For consistency, the function keyword version should also have an optional parameter list.

This means that the syntax for InlineFunctionExpr can be simplified to:

InlineFunctionExpr ::= ("function" | "->")  FunctionSignature?  FunctionBody

Update: From recent discussions, the -> operator as both a thin arrow expression and an inline function definition is confusing. As such, a replacement for -> in the inline function context should be identified.

In the context of the variant without a parameter definition (e.g. when used with arrow operators), the question is how should it work. I suggest:

  1. it should be a 0 and 1 arity function with the parameter argument defaulting to ();
  2. if the parameter is a single value, it should bind to the . (context item) and ~ (context value -- https://github.com/qt4cg/qtspecs/issues/129);
  3. if the parameter is an empty sequence, or multi-valued sequence, it should bind to the ~ (context value -- https://github.com/qt4cg/qtspecs/issues/129) only.

This way, it will be usable in multiple contexts.

ChristianGruen commented 3 years ago

It would definitely be nice if function and -> could be used 100% as aliases. Otherwise, people might tend to mix up the syntax.

rhdunn commented 3 years ago

A suggestion from a slack thread [1] was to support constructs like function $x { }, function $x as xs:string { }, -> $x { }, and ->$x as xs:string { }. This could be achieved using:

InlineFunctionExpr ::= ("function" | "->")  (FunctionSignature | Paramlist)?  FunctionBody

Liam Quin suggested being able to place the return type after the function body in the form without parenthesis around the parameters. This would make the new syntax:

InlineFunctionExpr ::= ("function" | "->")  ((FunctionSignature FunctionBody) | (ParamList FunctionBody TypeDeclaration?))

[1] https://xmlcom.slack.com/archives/C01GVC3JLHE/p1615086157038500

ChristianGruen commented 2 years ago

If we add => as inline function operator for binding sequences (as proposed in #129, Example 3), it may be reasonable not to equate function and ->. Otherwise, it would be uncertain if function { } binds a single item or a sequence.

=> is the more general operator: -> is comparable to =>, but it triggers an error if zero, or more than one item, is supplied as argument. Hence, we could also treat function as => as full aliases.

ChristianGruen commented 1 year ago

Edit (2023-04-17): Alternative D added


Building on the proposal in https://github.com/qt4cg/qtspecs/issues/435#issuecomment-1508228624, I’m summarizing variants we’ve been discussing as slicker alternatives for function($a, $b) { }:

A. Arrow syntax: ->($a, $b) { $a, $b }, -> { . }

Defined in the current specification. If the parameters are omitted, the single argument is bound to the context item.

Motivation:

Possible concerns:

B. Alias for function keyword: fn($a, $b) { $a + $b }, fn { . }

Introduction of a short alias for the fn keyword (mentioned in https://github.com/qt4cg/qtspecs/issues/435#issuecomment-1504985791).

Motivation:

C. Syntax without argument lists: { $1, $2 }, { . }

Presented by Michael Kay: https://github.com/qt4cg/qtspecs/issues/435#issuecomment-1504810495.

D. Java arrow syntax: ($a, $b) -> { $a, $b }, $a -> { $a }

Possible concerns see https://github.com/qt4cg/qtspecs/issues/53#issuecomment-1509736761 and https://github.com/qt4cg/qtspecs/issues/53#issuecomment-1511118975.

Examples

Variant 0 is what we already have.

Built-in function, function item with two arguments

Variant Code
0 map:for-each($map, function($key, $value) { $key * $value })
A map:for-each($map, ->($key, $value) { $key * $value })
B map:for-each($map, fn($key, $value) { $key * $value })
C map:for-each($map, { $1 * $2 })
D map:for-each($map, ($key, $value) -> { $key * $value })

Built-in function, function item with single arguments

Variant Code
0 sort($strings, (), function($n) { number($n) })
A sort($strings, (), -> { number(.) })
B sort($strings, (), fn { number(.) })
C sort($strings, (), { number(.) })
D sort($strings, (), $n -> { number($n) })

Function item, assignment to variable

Variant Code
0 let $shift := function($n, $s) { $n * math:pow(2, $s) }
return $shift(5, 2)
A let $shift := ->($n, $s) { $n * math:pow(2, $s) }
return $shift(5, 2)
B let $shift := fn($n, $s) { $n * math:pow(2, $s) }
return $shift(5, 2)
C let $shift := { $1 * math:pow(2, $2) }
return $shift(5, 2)
D let $shift := ($n, $s) -> { $n * math:pow(2, $s) }
return $shift(5, 2)

Fat arrow operator, single argument

Variant Code
0 $value => (function($n) { $n = 0 })()
A $value => -> { . = 0 }()
$value => { . = 0 } (specific variant for the arrow operator)
B $value => fn { . = 0 }()
C $value => { . = 0 }()
D $value => $n -> { $n = 0 }()

Thin arrow operator, single argument

Variant Code
0 for $n in $numbers
return $n => (function($n) { $n + 1 })()
$numbers ! function($n) { $n + 1 }(.)
A $numbers -> -> { . + 1 }()
$numbers -> { . + 1 } (specific variant for the arrow operator)
B $numbers -> fn { . + 1 }()
C $numbers -> { . + 1 }()
D $numbers -> $n -> { $n + 1 }() (conflicting, needs to be sorted out)

Fat arrow operator, two arguments

Variant Code
0 $value => (function($n, $base) { $base + $n })(1000)
A $value => ->($n, $base) { $base + $n }(1000)
B $value => fn($n, $base) { $base + $n }(1000)
C $value => { $2 + $1 }(1000)
D $value => ($n, $base) -> { $base + $n }(1000)

Summary

rhdunn commented 1 year ago

Note that the idea behind A and B are the same -- to introduce a short inline function specifier. In the case of fn, this could be something different.

Using something like /($n, $base) { $base + $n } doesn't work as / is the step expression and /($n, $base) is a syntactically valid expression due to steps allowing parenthesized expressions, and a variable reference being a valid single expression.

ndw commented 1 year ago

Speaking as a guy who just wandered in off the street...

  1. Why do we need a more compact syntax?
  2. Using fn instead of function to save 6 characters seems just silly. And I'd say that the fact that fn is often used as the prefix for the default function namespace is an argument against using fn as a shortcut for the word function. It just invites confusion.
  3. Option C doesn't appeal to me because I think introducing a function with just a bare { seems grammatically troublesome and having to name the arguments $1, $2, etc. doesn't make me feel any better about it.

I note that JavaScript uses (arg1, arg2, ...) => { function-body } as a shortcut. But I doubt reusing the fat arrow there is going to work for us.

rhdunn commented 1 year ago

Language Study -- Kotlin

In Kotlin (https://kotlinlang.org/docs/lambdas.html#lambda-expressions-and-anonymous-functions) you can write:

fun f0(f: () -> Int) {} // $f as function () as xs:int
fun f1(f: (a: Int) -> Int) {} // $f as function ($a as xs:int) as xs:int
fun f2(f: (a: Int, b: Int) -> Int) {} // $f as function ($a as xs:int, $b as xs:int) as xs:int

fun g() {
    // 0 arity
    f0(fun (): Int { return 1 }) // [1]
    f0({ 1 }) // [2], [5]
    f0() { 1 } // [2], [3], [5]
    f0 { 1 } // [2], [3], [4], [5]

    // 1 arity
    f1(fun (x: Int): Int { return x + 1 }) // [1]
    f1(fun (x): Int { return x + 1 }) // [1], [6]
    f1({ x: Int -> x + 1 }) // [2]
    f1({ x -> x + 1 }) // [2], [6]
    f1({ it + 1 }) // [2], [3], [5], [7]
    f1 { it + 1 } // [2], [3], [4], [5], [7]

    // 2 or more arity
    f1(fun (x: Int, y: Int): Int { return x + y }) // [1]
    f1(fun (x, y): Int { return x + y }) // [1], [6]
    f1({ x: Int, y: Int -> x + y }) // [2]
    f1({ x, y -> x + y }) // [2], [6]
    f1 { x, y -> x + y } // [2], [3], [4], [6]
}

[1] Inline function expression, equivalent to the current XPath/XQuery 3.1 InlineFunctionExpr syntax.

[2] Lambda expressions for more concise inline function expression definitions, akin to what we are exploring here and with the draft 4.0 -> syntax.

[3] For the last parameter that is a function reference, the lambda expression can be moved outside the function call. This is used in creating DSLs (Domain Specific Languages), e.g. in the Kotlin variant of gradle scripts.

[4] If the function being called is a 1-arity function taking a function reference, the parentheses can be omitted from the function name. This is used in creating DSLs (Domain Specific Languages), e.g. in the Kotlin variant of gradle scripts.

[5] The { 1 } lambda expression without any parameters binds to 0 and 1 arity function references.

[6] Parameter types are optional.

[7] For 1-arity functions, it is the name of the anonymous parameter. For other arity functions, it is not defined, so for arity > 2, you need to specify the arguments.

rhdunn commented 1 year ago

While something like { ($x, $y) -> ... } could be possible, the use of the -> operator in this context does not feel like XPath/XQuery to me and would be confusing syntax, especially if we keep the -> as the thin arrow operator. Likewise, it breaks the current model of being able to disambiguate the context using just a pair of tokens.

rhdunn commented 1 year ago

So lets look at what we can do.

It should be possible to omit the parenthesis in 2 cases:

  1. when the result is a 0-arity function -- function { 1 }.
  2. when the result is a 1-arity function -- function { . + 1 }. -- Note that we can use the context item here. If we don't want to set the context item here (e.g. to keep/inherit it from the context outside of the inline function), we can define some other name for this.

This would mean that the function { ... } syntax has the type function ($arg as item()* := ()) as item()*.

ChristianGruen commented 1 year ago
  1. Why do we need a more compact syntax?

@ndw For the same reason, I’d say, why many other languages introduced functional shortcuts in recent versions (using functions gets more and more common – and XQuery has always been a functional language). Next, we also have a compact one/two-character-syntax for numerous other operations since the very beginning.

Note that the idea behind A and B are the same -- to introduce a short inline function specifier.

@rhdunn I agree. With B, subsequent occurrences of arrows (=> ->, etc.) could be avoided.

The Kotlin summary is helpful, thanks.

dnovatchev commented 1 year ago

I largely agree with @ndw.

All of these cases result in expressions that require additional mental labor in order to be understood.

The gain in briefness is minimal and clearly doesn't justify the added additional effort to understand the expression.

In the 6 tables only one shows the use of the simple mapping operator ! and in this case it is obvious that using ! produces an expression that is comparable in length (not too-longer) than using the newly proposed abbreviation syntax.

Of all variants 0, A, B, and C only C (Using $1, $2, etc. to denote the 1st, 2nd, etc. argument) seems to be quite natural and understandable. But even in this case, if the function has more than 3 arguments, one would need to force his memory in recalling "what was $4?".

My preference is not to introduce any of these new syntaxes at all, and to stop investing significant portion of our time in this.

I would prefer the simple case when we are using well-named variables, and the names tell us at a glance everything that each of the variables in the expression means.

If we offer the user a way to support unintelligible scribbling, this may save them a few seconds in writing the expression, but just a month later even the code author might be puzzled what that cryptic expression was intended to mean.

Imagine such brevities spread all over the place and even deeply nested ... 😱

From the standpoint of readability, understandability and maintainability, it seems that we have already reached the limit of briefness, surpassing which brings more problems than advantages.

michaelhkay commented 1 year ago

Why do we need a more compact syntax?

Experience with other languages suggests that providing a simplified syntax for anonymous functions leads to an explosion in their use, as in many cases this suddenly becomes the most concise way of expressing yourself.

sort(//employee, (), {@salary})

just feels so much simpler than

sort(//employee, (), function($emp){$emp/@salary})

that it makes a real difference to people's willingness to use the construct.

dnovatchev commented 1 year ago

Why do we need a more compact syntax?

Experience with other languages suggests that providing a simplified syntax for anonymous functions leads to an explosion in their use, as in many cases this suddenly becomes the most concise way of expressing yourself.

sort(//employee, (), {@salary})

just feels so much simpler than

sort(//employee, (), function($emp){$emp/@salary})

that it makes a real difference to people's willingness to use the construct.

@michaelhkay ,

Compare

(1) sort(//employee, (), {@salary})

to:

(2) sort(//employee, (), $getSalary)

Aiming for maintainability and understandability, it is easier to see that (2) will not need to be modified even if the schema of the document was significantly changed and now the salary was not an attribute but an element named "baseSalary".

In the case of (1), one must manually edit all such expressions to replace @salary with whatever the new expression giving the salary must be. And this may have to be done in many places, depending on how prolific the developer was in using this writing style.

On the other side, in case of (2), nothing in this expression (or in any of possibly multitude of other expressions containing $getSalary) needs to be changed. There will be just one (single) change -- only in the code of the $getSalary function.

Add to this the fact that using (2) our code actually handles without any modification many differently-typed source XML documents, in each of which the salary must be retrieved in a different way. What is different in each case is $getSalary , and this will be injected as a parameter.

Now tell me: which of the techniques (1) or (2) would you recommend to be used, having maintainability and understandability, and most importantly reusability in mind?

For me the answer is clear: Obviously/unanimously/overwhelmingly: (2)

ChristianGruen commented 1 year ago

I agree with Michael that a more concise syntax (however it will look like) can motivate people to write functional code (even without noticing).

For me the answer is clear: Obviously/unanimously/overwhelmingly: (2)

(2) by itself is incomplete. It would either need to be…

sort(//employee, (), function($key) { @key/@salary })

…or:

let $getSalary := function($key) { @key/@salary }
return sort(//employee, (), $getSalary)

If you prefer to bind the comparison function to a variable, you can always do that, no matter what the syntax is:

let $getSalary := function($key) { $key/@salary }
let $getSalary := function { @salary })
let $getSalary := ->($key) { $key/@salary }
let $getSalary := -> { @salary }
let $getSalary := fn($key) { $key/@salary }
let $getSalary := fn { @salary }
let $getSalary := { @salary }
return sort(//employee, (), $getSalary)
liamquin commented 1 year ago

Tools to help intermediate users are very important. Two such tools are the type system and googlability.

So i prefer using f to fn, because searching for fn in XPath is useless.

f($a as my:socks, $b as $css:colour) as element(sock) { ...} isn't much terser than using "function", though.

As XPath gets more and more like a general functional programming language it gets harder and harder to teach to the primary audience.

So either we need to think about changing the primary audience, or we need to keep those intermediate non-programmer people in mind.

Next time a journal publisher needs to generate HTML from JATS in their house style, they're already using an external consultant because XSLT 1 is too much like programming, so now they can hire an undergraduate who knows Python, as we lose the argument that we are a domain-specific language. The undergrad of course will use CHAT-GPT 7, and it won’t work, so they’ll come to me with broken code i can’t repair :)

//fr ! { @id } ! string() => distinct-values() => string-join(', ')

can of course be rewritten using //fr/@id but given that

<xsl:attribute name="class">
  <xsl:value-of>fnref</xsl:value-of>
</xsl:attribute>

is already an idiom i see, i’m nervous.

Although i understand the motivation for more concise inline functions, it feels similar to the motivation for </> to end an arbitrary element, and the problems we know that caused in SGML.

dnovatchev commented 1 year ago

(2) by itself is incomplete. It would either need to be…

sort(//employee, (), function($key) { @key/@salary })

…or:

let $getSalary := function($key) { @key/@salary }
return sort(//employee, (), $getSalary)

Yes, and when I said "injected", this means passed as a parameter or used as a call to a function defined in a commonly accessible library.

Either way, compared to the literal inline syntax, this abstraction gives us:

If I have the choice to hire one of two developers, where Dev1 uses:

(1) sort(//employee, (), {@salary})

and Dev2 uses:

(2) sort(//employee, (), $getSalary)

then, everything else equal, I definitely would prefer Dev2.

ChristianGruen commented 1 year ago

As XPath gets more and more like a general functional programming language it gets harder and harder to teach to the primary audience.

@liamquin A good point. From my personal perspective, it would certainly be reasonable if such enhancements would not be available in XPath, but limited to XQuery. When I teach XPath, my solution is to only present XPath 1.0 features, and I don’t confront people with things like the let clause, quantifiers or error handling.

From a technical perspective, it makes sense to regard XPath as a subset of more complex languages, but it’s hard to justify that it does more than processing paths. Of course, people could ask why XQuery is called XQuery if you can create and modify JSON, CSV and binary data with it as well…

And it’s hard to assess who our primary audience is today. Inspired by Norm’s comment on Slack, I wonder if we should really invest more time to find out.

dnovatchev commented 1 year ago

If you prefer to bind the comparison function to a variable, you can always do that, no matter what the syntax is:

let $getSalary := function($key) { $key/@salary }
let $getSalary := function { @salary })
let $getSalary := ->($key) { $key/@salary }
let $getSalary := -> { @salary }
let $getSalary := fn($key) { $key/@salary }
let $getSalary := fn { @salary }
let $getSalary := { @salary }
return sort(//employee, (), $getSalary)

When we bind (or inject) a function to a variable just once it doesn't really matter so much what syntax to use, and in the case when this is injected (as in: passed as parameter) we don't have to do any binding at all -- we just reference that variable.

Thus we define once (or 0 times if injected) $getSalary and use it everywhere in the currently supported 3.1 form:

(2) sort(//employee, (), $getSalary)

Because it is virtually the same length as:

(1) sort(//employee, (), {@salary})

and the latter eliminates all the benefits we get from using (2), there is no justification at all to use the (1) syntax.

And using this gives us no added benefit :

(1) sort(//employee, (), {$getSalary()})

On the contrary, it results in longer and unnecessarily more complex and more difficult to understand expression than simply:

(2) sort(//employee, (), $getSalary)

benibela commented 1 year ago

C | map:for-each($map, { $1 * $2 })

That makes it really hard to see the arity. Or what do you write when you need a 3-arity function with the third parameter being unused? Or nested functions {$1 * (let $b := $2 return { $1 + $b } )(7) }

Using something like /($n, $base) { $base + $n } doesn't work as / is the step expression and /($n, $base) is a syntactically valid expression due to steps allowing parenthesized expressions, and a variable reference being a valid single expression.

\($n, $base) { $base + $n } might work. But it is hard to write when including queries in strings of other programming languages that use \ for escapiung

Why do we need a more compact syntax?

Higher order functions are useless if the code using them is longer than the code not using them

I note that JavaScript uses (arg1, arg2, ...) => { function-body } as a shortcut. But I doubt reusing the fat arrow there is going to work for us.

That might actually work great

On the right side of the current => there has to be a function. No function starts with {, so it should parse without issues.

As XPath gets more and more like a general functional programming language it gets harder and harder to teach to the primary audience.

Everything would be much more comfortable if there had never been any anonymous or higher order functions put in XPath. It could have been an XQuery only feature

dnovatchev commented 1 year ago

Everything would be much more comfortable if there had never been any anonymous or higher order functions put in XPath. It could have been an XQuery only feature

Too late for this.

As I said 10 years ago, "the cat is out of the bag", either live with this or get yourself another cat.

If the language moved in this direction, it did because people wanted it. Maybe there is a parallel universe in which XPath was permanently frozen in version 1.0. I am glad this isn't my universe.

michaelhkay commented 1 year ago

sort(//employee, (), $getSalary)

Of course I know that if I want to use the same function in more than one place, then it's best to declare it globally rather than using an anonymous inline function. But my Java and Javascript and C# code have become much more readable by using anonymous compact lambda expressions, and I think that will also be true for XPath.

dnovatchev commented 1 year ago

sort(//employee, (), $getSalary)

Of course I know that if I want to use the same function in more than one place, then it's best to declare it globally rather than using an anonymous inline function. But my Java and Javascript and C# code have become much more readable by using anonymous compact lambda expressions, and I think that will also be true for XPath.

Even when a user doesn't have access to the source code of a product, they are judging this product's quality by the number/frequency of bugs and the mean period to fix them.

Using lambda functions reduces the understandability and maintainability of an application. Reduced maintainability means longer time for fixing a bug. It is not too difficult to correlate finding bugs in a product with hearing its Devs boast of using lambda functions.

Here is what Trey Hunner, a former director of the Python Software Foundation, and a frequent speaker at Python conferences says about using lambda functions in Python: https://treyhunner.com/2018/09/stop-writing-lambda-expressions/#Should_you_ever_use_lambda_expressions?

image

Here are excerpts from this article:

"Lambda is both misused and overused

When I see a lambda expression in unfamiliar code I immediately become skeptical. When I encounter a lambda expression in the wild, I often find that removing it improves code readability.

Sometimes the issue is that lambda expressions are being misused, meaning they’re used in a way that is nearly always unideal. Other times lambda expressions are simply being overused, meaning they’re acceptable but I’d personally prefer to see the code written a different way.

Let’s take a look at the various ways lambda expressions are misused and overused. "


"If a function is important, it deserves a name. You could argue that most functions that are used in a lambda expression are so trivial that they don’t deserve a name, but there’s often little downside to naming functions and I find it usually makes my code more readable overall.

Naming functions often makes code more readable, the same way using tuple unpacking to name variables instead of using arbitrary index-lookups often makes code more readable."


michaelhkay commented 1 year ago

That's an opinion piece, it doesn't contain a single piece of factual evidence that use of lambda expressions increases the number of bugs in your code. In fact it doesn't even make that suggestion, which would indeed be highly implausible. Most factual evidence suggests that if you make your code more concise, within reason, you will reduce the number of bugs.

We're dealing with a generation of programmers now who are familiar with JQuery, LINQ and the streams API in Java. Those APIs make liberal use of lambda expressions the most natural thing in the world an the resulting code can be highly concise, readable, and expressive. You seem to be trying to turn back the clock.

liamquin commented 1 year ago

On Sun, 2023-04-16 at 22:38 -0700, Michael Kay wrote:

That's an opinion piece, it doesn't contain a single piece of factual evidence that use of lambda expressions increases the number of bugs in your code.

We already have inline functions and i don't think anyone is suggesting removing them. At no point did i say it increases the number of bugs.

The proposal is not to add lambda expressions. It’s to make a more concise (and more cryptic) syntactic sugar for them in the case they have no parameters.

We're dealing with a generation of programmers

Are we? You are saying the primary users of XPath and XSLT are experienced programmers. That's not been my experience, although i know others may well see a different set of users.

If i wanted significantly more concise code i’m not sure i’d be using XSLT at all. The motivation isn’t conciseness, it’s applicabililty.

The balance is that people who use XPath once or twice a month, maybe once a week, but not all day every day, aren’t going to be able to work out what’s going on, and likely will wander off to Python or JavaScript. The use of keywords has helped those people to do searches and figure out what’s going on.

-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org

rhdunn commented 1 year ago

@dnovatchev Different people have a diversity of coding styles, experience levels, and what they find intuitive.

Some people (like you) prefer to write mainly in XPath, so bind inline functions/lambda expressions to map keys. -- This was a common pattern in JavaScript until it introduced classes to the language. (In XSLT we have xsl:function, and in XQuery `declare function, as alternatives that are more natural fits for the host languages.)

Some people write functions and libraries in XSLT (Michael, Liam). Some people do the same in XQuery (Christian, me).

Within XSLT, some people prefer to use xsl:templates, others xsl:for-each, others higher-order functions.

What we are exploring here is a way to make it easier for the people who use higher order functions -- which include people like me with e.g. https://docs.marklogic.com/xdmp:invoke-function. Many other languages have into the language to make inline functions easier to use. Some of those languages (e.g. JavaScript and Kotlin) the use of lambda expressions have become idiomatic in the language.

Like is happening in JavaScript when they introduced lambda expressions, this would also help you/others when writing map-based XPath function libraries, e.g.:

let $lib := map {
    "f": -> ($x) { $x + 1 },
    "g": -> { 123 }
}

You can always teach/use a different subset of the language (e.g. don't use/teach classes in C++ or JavaScript) if you want. No-one is forced to use a feature. For example, you can teach XSLT without the packages functionality or xsl:for-each.

michaelhkay commented 1 year ago

@liamquin It was @dnovatchev who suggested that a more concise syntax for inline functions would make code less reliable, it wasn't you.

You say: "You are saying the primary users of XPath and XSLT are experienced programmers". No, that's not what I'm saying. I'm advocating that we should allow constructs along the lines of sort(->{@salary}) in preference to sort(function($x as element(employee)) as xs:string {string($x/attribute::salary)} because I think it's easier to read and easier to write whatever your level of programming experience, and I'm pointing to the change in the way people write Java, C#, and JavaScript as evidence of this.

michaelhkay commented 1 year ago

@ChristianGruen wrote" "From my personal perspective, it would certainly be reasonable if such enhancements would not be available in XPath, but limited to XQuery."

The main test I apply when considering whether something is useful in XPath, is whether it's going to be useful in XPath-within-XSLT. Of course that's not the only scenario to consider, but I think it's usually likely to give the right answer.

We're increasingly taking advantage of higher-order functions when designing new XSLT capabilities. Consider xsl:map/@on-duplicates. Using a function call-back here is extremely powerful, but simple cases (as given in the current examples) are pretty cumbersome:

<xsl:map on-duplicates="function($a, $b){$b}">...</xsl:map>
<xsl:map on-duplicates="function($a, $b){$a, $b}">...</xsl:map>

There's just too much boilerplate here. Under one of my proposals, this would become:

<xsl:map on-duplicates="{$2}">...</xsl:map>
<xsl:map on-duplicates="{$1, $2}">...</xsl:map>

which in my view cuts out all the noise. There are pros and cons for that particular syntax, I know, but I do think "cutting out the noise" is important.

ndw commented 1 year ago

Setting aside for the moment the question of what other uses we might want to make of ->, if it was reserved for this purpose, could we make this work:

<xsl:map on-duplicates="($a, $b) -> {$b}">...</xsl:map>
<xsl:map on-duplicates="($a, $b) -> {$a, $b}">...</xsl:map>

That has syntactic similarity to Java and JavaScript for similar cases and avoids having to use $1 and $2 as argument names which I worry about. If this facility exists, people will (ab)use it and write long function bodies with it. They'll get $1 and $4 backwards in one place, and it'll take ages to work out what's wrong.

I'd expect a zero argument function to be written () -> { ... } so it would always be ( followed by zero or more argument names followed by ) -> { ... }.

ChristianGruen commented 1 year ago

Using lambda functions reduces the understandability and maintainability of an application.

@dnovatchev Do you think that applies to languages in general? What about Haskell?

The ongoing discussion is interesting, and it’s so diverse because we discuss very general topics, among others: teaching; our target group; language comprehension.

I largely agree with Reece that we all have gathered different experiences, are working with different user groups, and have individual preferences. There’s always some danger that features will be pushed to the limit, but we’d need to apply the same concerns to XPath 1.0, both syntactically – think of $_[.=.*./*] – and semantically – think of * = '' vs. * != ''. Still, it doesn’t seem to be a common belief that the potentially cryptic syntax prevented users from using XPath. Similarly, code can get illegible if every snippet of code is rewritten to use anonymous functions (with or without a more compact syntax). We can all find examples for a compact syntax that is better or harder to read. I believe it’s up to teachers/project leaders/developers to recommend/define/follow conventions that result in digestible results.

What we can easily observe is that compact enhancements for functions in contemporary languages have been widely adopted by the developers.

Said that, my stance is:

  1. If a function has multiple arguments, I would love to have a compact syntax that still allows us to name the parameters (with the suggestions ordered by personal preference):
fn($a, $b) { }   (: with 'fn' being a plain alias for 'function', or just 'f' :)
($a, $b) -> { }  (: questionable; see prev. discussions and Reece’s note above on the implications :)
->($a, $b) { }   (: overlaps with arrow operator(s) :)

Numeric references may be sufficient when an anonymous function is passed as an argument to a built-in function, but I think the syntax is too cryptic when being applied more generally. Next, it contrasts with the recently added keyword arguments, which could in principle be also applied to dynamic functions; and it makes refactoring more complicated:

fn($int, $string) { string($int) = $string }  →  fn($string, $int) { string($int) = $string }
{ string($1) = $2 }                           →  { string($2) = $1 }  or  { $1 = string($2) }
  1. If a function has a single argument, and if we allow the . to reference a value, the supplied value could be bound to the context (as proposed in #129 and, as Michael mentioned, analogous to the underscore in Scala):
fn { . }  (: with 'fn' being a plain alias for 'function', or just 'f' :)
{ . }
function { . }

If we believe that’s one step too far, I’d still love to have a compact alias for function, because the function($item) { } pattern is just too wordy when it’s used again and again (and again)*.

michaelhkay commented 1 year ago

Could we make this work: ($a, $b) -> {$b}

It's messy because of the unbounded lookahead, but other languages had the same problem. Basically, I think the answer is that some parser technologies might have problems with it, but I'm not sure that should put us off. There's certainly a strong case for doing something that's the same as (or very similar to) Java, C#, and Javascript.

Note that C#, Java, and JS don't require the braces if the body is a simple expression, and they don't require the parentheses if there's exactly one argument. So we could aim to allow $a -> $a + 1.

It probably poses more problems for syntax-directed editors than for actual parsers. Typically if you type $ at the start of an expression you get prompted with a list of variables that are in scope, which isn't what you want.

Of course, if we adopt this for inline function expressions, we can't use -> for pipelines. We would have to use something else like ~> or =!>.

ChristianGruen commented 1 year ago

@ndw, everyone: I have added the Java syntax to my summary as alternative D: https://github.com/qt4cg/qtspecs/issues/53#issuecomment-1509719729.

innovimax commented 1 year ago

As XPath gets more and more like a general functional programming language it gets harder and harder to teach to the primary audience.

@liamquin A good point. From my personal perspective, it would certainly be reasonable if such enhancements would not be available in XPath, but limited to XQuery. When I teach XPath, my solution is to only present XPath 1.0 features, and I don’t confront people with things like the let clause, quantifiers or error handling.

I think this also can be counted as a argument to have something between XPath and XQuery as in #425

ChristianGruen commented 1 year ago

I think this also can be counted as a argument to have something between XPath and XQuery as in #425

@innovimax Wouldn’t we rather need a subset of XPath 3.0 instead of an additional layer between XPath and XQuery/XSLT/XForms/…? Or did you have XPath 1.0 in mind? – And of course, we’d need someone who’s going to do all the work.

dnovatchev commented 1 year ago

That's an opinion piece,

Yes, one opinion contrary to another opinion. This is what we have here: opinions. Are we seriously planning to add to the language constructs, that do not represent new features but are just in support to one certain writing style/opinion, disregarding another writing style/opinion?

Give to the kids a new XMas present, new power toys so that they will be swayed by the feeling of power, that at the same time may further aggravate the long-term problems of the ecological landscape of the programming environment?

it doesn't contain a single piece of factual evidence that use of lambda expressions increases the number of bugs in your code. In fact it doesn't even make that suggestion, which would indeed be highly implausible.

Nobody is saying that this writing style directly "increases the number of bugs in your code". Something clear, which no one contested, is that using shorthands (I also called these "scribblings"), that is undocumented functions, all over the place, reduces the understandability and maintainability of the code base. Reduced maintainability causes difficulty and delays in fixing bugs, and may lead to introducing new bugs in such fixes.

Most factual evidence suggests that if you make your code more concise, within reason, you will reduce the number of bugs.

We're dealing with a generation of programmers now who are familiar with JQuery, LINQ and the streams API in Java. Those APIs make liberal use of lambda expressions the most natural thing in the world an the resulting code can be highly concise, readable, and expressive.

We see an opinion from a representative of such a language, so even with such programming languages there are people who clearly grok this writing style for what it is and sound the warning.

Do note that the author "helps teams to write better Python code". This is "programming in the large" vs. "programming in the small".

You seem to be trying to turn back the clock.

These are truths that do not depend on time.

ChristianGruen commented 1 year ago

Give to the kids a new XMas present, new power toys so that they will be swayed by the feeling of power, that at the same time may further aggravate the long-term problems of the ecological landscape of the programming environment?

Good news: Kids don’t use XML anymore ;·)

These are truths that do not depend on time.

Truths or opinions?

dnovatchev commented 1 year ago

Truths or opinions?

If different people, not knowing each other and having different programming-language backgrounds independently come to the same conclusion, this would be too-much of a coincidence, wouldn't it?

So, this seems more than just "opinions" and I believe that this is a truth, at least at the scale of "programming in the large".

Not seeing the forest for the trees happens so often, so much all the time, that there is even a proverb using exactly this phrase.

dnovatchev commented 1 year ago

@liamquin 👍

The balance is that people who use XPath once or twice a month, maybe once a week, but not all day every day, aren’t going to be able to work out what’s going on, and likely will wander off to Python or JavaScript. The use of keywords has helped those people to do searches and figure out what’s going on.

👍

If i wanted significantly more concise code i’m not sure i’d be using XSLT at all. The motivation isn’t conciseness, it’s applicabililty.

👍

Thanks for sharing your opinion, Liam.

As a whole: +💯

rhdunn commented 1 year ago

The audience Liam is talking about will never (or almost likely never) use inline functions or lambda functions in the first place, so they are not the target audience for this. The target audience for this is the people (in XSLT or XQuery) who make extensive use of higher order functions who at a minimum have to write something like function ($a, $b) { $a + $b } when passing an inline expression to one of the higher-order functions.

liamquin commented 1 year ago

On Mon, 2023-04-17 at 10:33 -0700, Reece H. Dunn wrote:

The audience Liam is talking about will never (or almost likely never) use inline functions or lambda functions in the first place,

But, they will encounter them in XSLT others have written.

Yes, i’d probably use the proposed shortcut myself; Mike’s examples are fairly compelling. (Sorry, i saw Mike's comment before Dimitre's last night in email).

I worry slightly about confusion in XQuery with enclosed expressions but probably it'll be OK in context.

But i do think it needs to be possible to supply argument and return types - i'd really like to see a static (lexical) expresion type in which all statically referenced, called, or defined functions myst be typed, and in XSLT a mode where all declared templates and variables must have as= and maybe also xsl:context-item (sort of by analogy with streaming, where you can call templates in another mode and those templates don't have that restriction).

-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org

dnovatchev commented 1 year ago

Seems this is about / onto our topic:

https://belaycpp.com/2021/11/24/is-my-cat-turing-complete/

From the above article:

"About “cat-computing”

_Jokes aside, cat-computing is the name I give to this generalized practice. In my experience, it happens quite often that when someone discovers a new feature of a language, they begin to use it everywhere, just because they can and they want to. However, just like you can execute code using a cat4 but shouldn’t, it’s not because you can use a feature that you should."_ "Also, cat-computing is animal abuse, so don’t do it .:confounded:

image

dnovatchev commented 1 year ago

On Mon, 2023-04-17 at 10:33 -0700, Reece H. Dunn wrote: The audience Liam is talking about will never (or almost likely never) use inline functions or lambda functions in the first place,

But, they will encounter them in XSLT others have written.

👍

michaelhkay commented 1 year ago

There's also dog-computing, in which people continue to use old features even when the new ones are clearly a better fit for the task, for example writing sum(for $p in //product return $p/@price - $p/@discount)) rather than sum(//product ! (@price - @discount)).

ndw commented 1 year ago

I’m trying to figure out how to frame this issue such that we might be able to have a productive discussion about it this afternoon. We’re trying to pick a point solution in a large, multi-dimensional problem space. Depending on a bunch of highly-individual factors, we have opinions about what point or points are best.

Dimension: verbosity.

A mechanism that is more verbose is harder to use than one that is compact. At the same time, a solution that is so compact that it’s inscrutable to the reader is also hard.

Dimension: locality.

It’s always possible to define a function and then refer to that definition. This can be done by giving the function a name and then referring to it by name or by putting the function in a variable and using that variable.

Like verbosity, on the one hand, having to put the definition “out of band” makes it harder to use. On the other hand, readers unfamiliar with higher order functions and lambda expression may find inline uses unfamiliar.

Dimension: audience.

In the beginning, XSLT 1.0 had a very clear and narrow scope: what was necessary to transform XML into XSL FO and HTML. By the end, it had a somewhat wider scope, but it’s worth remembering that there was debate in the XSLT 1.0 WG about whether or not the language needed the features that would be required to write an identity transform. That was viewed as out-of-scope for some members.

Professional programmers writing libraries and applications in XPath, XQuery, and XSLT have different experience with and tolerance for “syntactic complexity” or perhaps varying forms of syntactic sugar.

On the one hand, we can say that it’s ok to add new features that some users will find offputting because they don’t have to use those features, the reality is that they will end up reading code that uses those features and it’s not quite fair to say it “doesn’t matter”.

I think the point that was made about the strength and value of XPath as a domain specific language as opposed to a more general, fully featured functional programming language is worth remembering.

Framing

I think arguments that try to negate one perspective by asserting that an alternative formulation delivers the same value: it’s okay to add lambdas because non-programmers don’t have to use them or it’s unnecessary to have an inline syntax because you can put the function in variable are probably not going to help us find a way forward.

My impression is that a plurality of members want a more compact syntax that can be used locally and that while some concern has been raised about making the language less approachable for casual users, that’s not persuasive to most of the group.

I think we can break a discussion into two pieces: is there consensus that my impression is correct? If so, we have to work out what compact syntax we like best that we think works in the grammar. If not, then the question is, I think, to what extent are we going to continue to pursue this issue.

michaelhkay commented 1 year ago

I would add: We thought we had consensus on the syntax

->{expr} ->($a, $b) {expr}

but we decided to revisit this primarily because the syntax doesn't work as nicely as we would like in "arrow pipelines".

I am absolutely convinced that we need a compact syntax for inline functions. It should support

It does not need to support type declarations for the arguments or the result (people can use the existing syntax if they want that).

We can manage without the special syntax for "focus functions" (->{expr}) in the interests of keeping things simple.

I'm now inclined to go for syntax that closely resembles lambda expressions in Java, Javascript, and C#, despite the fact that the cause some parsing lookahead difficulty. Specifically

ExprSingle ::= LambdaExpr LambdaExpr ::= LambdaParams "->" ExprSingle LambdaParams ::= VariableReference | "(" (VariableReference ",")* VariableReference)? ")"

Examples:

() -> math:pi() $x -> $x + 1 ($x, $y) -> $x + $y ($x, $y) -> ($x, $y)

Semantics:

($x, $y) -> Expr

is equivalent to

function($x as item(), $y as item()) as item()* { Expr }

Michael Kay Saxonica

On 18 Apr 2023, at 14:42, Norman Walsh @.***> wrote:

I’m trying to figure out how to frame this issue such that we might be able to have a productive discussion about it this afternoon. We’re trying to pick a point solution in a large, multi-dimensional problem space. Depending on a bunch of highly-individual factors, we have opinions about what point or points are best.

Dimension: verbosity.

A mechanism that is more verbose is harder to use than one that is compact. At the same time, a solution that is so compact that it’s inscrutable to the reader is also hard.

Dimension: locality.

It’s always possible to define a function and then refer to that definition. This can be done by giving the function a name and then referring to it by name or by putting the function in a variable and using that variable.

Like verbosity, on the one hand, having to put the definition “out of band” makes it harder to use. On the other hand, readers unfamiliar with higher order functions and lambda expression may find inline uses unfamiliar.

Dimension: audience.

In the beginning, XSLT 1.0 had a very clear and narrow scope: what was necessary to transform XML into XSL FO and HTML. By the end, it had a somewhat wider scope, but it’s worth remembering that there was debate in the XSLT 1.0 WG about whether or not the language needed the features that would be required to write an identity transform. That was viewed as out-of-scope for some members.

Professional programmers writing libraries and applications in XPath, XQuery, and XSLT have different experience with and tolerance for “syntactic complexity” or perhaps varying forms of syntactic sugar.

On the one hand, we can say that it’s ok to add new features that some users will find offputting because they don’t have to use those features, the reality is that they will end up reading code that uses those features and it’s not quite fair to say it “doesn’t matter”.

I think the point that was made about the strength and value of XPath as a domain specific language as opposed to a more general, fully featured functional programming language is worth remembering.

Framing

I think arguments that try to negate one perspective by asserting that an alternative formulation delivers the same value: it’s okay to add lambdas because non-programmers don’t have to use them or it’s unnecessary to have an inline syntax because you can put the function in variable are probably not going to help us find a way forward.

My impression is that a plurality of members want a more compact syntax that can be used locally and that while some concern has been raised about making the language less approachable for casual users, that’s not persuasive to most of the group.

I think we can break a discussion into two pieces: is there consensus that my impression is correct? If so, we have to work out what compact syntax we like best that we think works in the grammar. If not, then the question is, I think, to what extent are we going to continue to pursue this issue.

— Reply to this email directly, view it on GitHub https://github.com/qt4cg/qtspecs/issues/53#issuecomment-1513183377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASIQIUU5HCJ37UNGHZTWQTXB2K5XANCNFSM4WPSWN6A. You are receiving this because you were mentioned.

rhdunn commented 1 year ago

@michaelhkay Wouldn't that have an ambiguity if we keep -> for thin arrow operators:

$x -> f($x)

IIUC, that would parse both as a lambda and a thin arrow expression.

If we confine the LHS to an EnclosedExpr then it wouldn't be ambiguous, as { isn't allowed following an arrow in an arrow expression (provided that that is removed from the current 4.0 draft).

ndw commented 1 year ago

My 2p is that we shouldn't use -> for two different things irrespective of whether or not we can disambiguate them syntactically.

ChristianGruen commented 1 year ago

I am absolutely convinced that we need a compact syntax for inline functions. It should support

  • any arity
  • named arguments

I strongly agree.

() -> math:pi() $x -> $x + 1 ($x, $y) -> $x + $y ($x, $y) -> ($x, $y)

I think we should make the curly braces mandatory. They are helpful to make to better readable, in particular if the function body is longer than a few characters. Next (as could be observed for computed node constructors à la element name {}, or if/then with mandatory else), braces reduce the danger of future grammar conflicts.

And, personally, I’d definitely love to keep the thin arrow operator alive.

michaelhkay commented 1 year ago

@michaelhkay https://github.com/michaelhkay Wouldn't that have an ambiguity if we keep -> for thin arrow operators:

Yes, we would need to find a different notation for the thin arrow operator.

Michael Kay Saxonica

dnovatchev commented 1 year ago

Reading about the history of mathematical symbols here, it once again seems that being too-eager to add new such symbols is most likely going to pollute this nice space.

Will any of the proposed operators be known 100 years from now?

image