qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Reinstate focus functions #503

Closed michaelhkay closed 1 year ago

michaelhkay commented 1 year ago

As a result of accepting PR #447, we have lost the ability to write simple "focus functions" that take the context item as an implicit argument, for example sort(//emp, (), ->{@salary}).

The new status quo is that people have to write sort(//emp, (), $e->{$e/@salary}) which feels clumsy in comparison.

This issue examines options for reinstating such a capability, and perhaps making it more powerful.

A reason for dropping the syntax was that it didn't play well with the "thin arrow" operator in pipelines, but we have now changed the symbol for that to =!> so the objection no longer applies so strongly.

Ideally we want something that not only replaces focus functions (arity one arguments accepting an argument of type item()), but also meets some or all of the following additional requirements:

This becomes a lot easier if we can solve issue #129 which generalises the context item to a context value. Let's assume we do that, and keep an open mind for the moment as to whether the generalized context value is referenced as . or ~. I'll use ~ for now. So we want a compact notation for functions of arity one in which the function body refers to the argument value as ~. For aesthetic reasons, because it's going to be used on the RHS of an arrow operator, we really don't want to introduce it with a leading arrow like the previous syntax ->{.+1}. Use of "bare braces" (simply {~+1}) is very tempting, but I think there is a good argument for leaving that part of the syntactic space unused, for extensibility and for diagnostics. I think my preference is for fn{~+1}. Using a keyword (such as map, array, validate) before a braced expression is a uniform device and keeps the grammar coherent.

So in a callback such as fn:sort, we can write sort(//emp, (), fn{@salary}), and in a pipeline we can write $list =!> fn{.+1}(). (To allow this, all we need to do is to generalise what's allowed as an ArrowDynamicFunction).

A separate question is whether we can (and should) allow the empty argument list to be omitted. I think I'm persuaded by the arguments that it's better to keep it, as a visual signal that the function is being applied, not just returned.

dnovatchev commented 1 year ago

I can live with:

->{.+1} (1)

but find this:

fn{~+1} (2)

immensely confusing and ugly.

Feeling completely lost about any possible meaning (2) might have... 😭

Also, is there any difference between (1) and (2), and if there is, what is it?


A separate question is whether we can (and should) allow the empty argument list to be omitted. I think I'm persuaded by the arguments that it's better to keep it, as a visual signal that the function is being applied, not just returned.

👍

Yes, even (1) above:

->{.+1} (1)

presents a challenge to read and understand.

What is bad in using:

() -> {.+1} (3)

or even:

(.) -> {.+1} (4)

Between (3) and (4) I would definitely prefer (4)

However I look at any abbreviation experiment like (2), I am always left with the insuppressible feeling that this is a bad service to the future XPath users.

michaelhkay commented 1 year ago

My main concern is the use of a focus function within an "arrow" pipeline. I feel that

$list =!> ->{.+1}()

is very difficult to read, and

$list =!> (.) -> {.+1}()

is even worse. It's logical, but the logic doesn't shine out.

So I'm looking for a notation for focus functions that doesn't involve arrows.

In the old syntax, we used something like

$list =!> {.+1}

which is nice and readable, but it's custom syntax for use in the pipeline and it would be nicer to have something that reflects the semantics that what we are really doing here is applying an inline function. I feel that

$list =!> fn{.+1}()

does this acceptably well.

ChristianGruen commented 1 year ago

I agree we still need a compact syntax for anonymous functions, in particular for one-liners. The variant without keyword would still be my favorite, I believe it’s easy enough to read:

sort(//emp, (), { @salary })
for-each(1 to 10, { . + 2 })
index-where($names, { . = '緑' })

And I agree we should keep the parentheses after the function to indicate that the function will be invoked:

'123'
=> { if(matches(., '^\d+$')) then 'digits' else 'no digits' }()

If we use a keyword, I would prefer to stick with function

sort(//emp, (), function { @salary })

…or introduce fn as a plain alias/synonym:

sort(//emp, (), function($emp) { $emp/@salary })
sort(//emp, (), function { @salary })

sort(//emp, (), fn($emp) { $emp/@salary })
sort(//emp, (), fn { @salary })
dnovatchev commented 1 year ago

My main concern is the use of a focus function within an "arrow" pipeline. I feel that

$list =!> ->{.+1}()

is very difficult to read, and

$list =!> (.) -> {.+1}()

is even worse. It's logical, but the logic doesn't shine out.

Yes, and these are extremely convoluted, longer, and artificially made unreadable, compared side by side to a better-represented function:

$list =!> (.) -> {.+1}() (1) $list =!> op('+')(?,1) (2)

Even if someone offers to pay me money, I would never use (1) when I can use at present/now without the need for any additional "features" the shorter and less cryptic (2)

Certainly, one can improve even (2) by defining a function:

let $incr := op('+')(?)

and then rewrite (2) into:

$list =!> $incr(1) (3)

Did you notice that there is no referral to . in (3) ? And even in the definition of $incr ?

This means that having . in all of the above was artificial and unneeded!

Now we have an expression that is context-independent and even any of its subexpressions is context-independent.

Why artificially clog the reader's mind with context that doesn't really exist? Just to further distract them from the nature of the problem they are trying to solve?

And finally, if brevity is needed, why wouldn't I write even in XPath 3.1 this:

$list ! op('+')(?,1)(.)

ndw commented 1 year ago

I appreciate that experts (myself included sometimes) value brevity, but man, this is starting to look like line noise. Haven't we historically used keywords rather than Perlesque syntax?

dnovatchev commented 1 year ago

I appreciate that experts (myself included sometimes) value brevity, but man, this is starting to look like line noise. Haven't we historically used keywords rather than Perlesque syntax?

Exactly, Norm.

To repeat what I said in the thread:

I am left with the insuppressible feeling that this is a bad service to the future XPath users.

ChristianGruen commented 1 year ago

Haven't we historically used keywords rather than Perlesque syntax?

_//at/least[*+.>-1]/partially/..
michaelhkay commented 1 year ago

Clearly I chose my example badly: {.+1} was just an example, and using op('+') might well work better here.

But in the general case, pipelines built using the => operator have proved very popular in XPath 3.1, and it's a real inconvenience that it's difficult to inject local inline functions into the pipeline to perform simple operations.

Haven't we historically used keywords rather than Perlesque syntax?

We have an amazing mix of terse punctuated syntax (a//b/@c[.=1]), Cobol-like pseudo-English (every $person in $america satisfies young($person) or old($person)), and XML template syntax. One of the challenges is keeping things readable when these different styles are composed together.

ChristianGruen commented 1 year ago

As we introduce more and more built-in functions that expect functions as arguments, I believe it's helpful to have a syntax that's more concise than what we have now. Otherwise, there'll often be no motivation to use the new functions at all.

At the same time, op('+')(?,1) or {.+1} may certainly appear cryptic to non-programmers. They’ll just use 1+... and resort to code that doesn't use HOF (and it's completely justified to do so).

michaelhkay commented 1 year ago

I think the reason the arrow has achieved popularity is that it relates to the way people think in terms of decomposing a task into a sequence of steps: strip off the currency symbol, round it to a multiple of 10, format it as a string, then convert it to upper case. Each of these steps takes input from the previous step and contains an implicit reference to the result of the previous step. So an expression like . => substring-after('$') => round(1) => format-number('999.999') => upper-case() is something people can easily relate to. But then you hit a stumbling block when you want to do something in the middle of the pipeline for which there is no handy function. For example, what if the rounding step needs to be conditional? It then becomes very natural to write

.  => substring-after('$') 
   => {if (. gt 1000) then round(., 1) else .}
   => format-number('999.999') 
   => upper-case()

Someone writing that isn't going to think of that step in the braces as a function, they are going to think of it as the next processing step to be performed in the sequence of tasks. We don't want it to look complicated or forbidding, and we don't want to force them to switch paradigm when the processing pipeline becomes just a little bit more complicated.

dnovatchev commented 1 year ago

Someone writing that isn't going to think of that step in the braces as a function, they are going to think of it as the next processing step to be performed in the sequence of tasks. We don't want it to look complicated or forbidding, and we don't want to force them to switch paradigm when the processing pipeline becomes just a little bit more complicated.

Yes, and once we have a standard function multiCompose no one will need anything cryptic and baffling.

Here is briefly, again the definition and usage of this function:

let $apply := function($f, $x) {$f($x)},
    $multComp := function($funs as function(*)*, $x)
                  {
                    fold-right($funs, $x, $apply)
                  }
 return
   $multComp((op('*')(?,5), op('+')(?,1)), 2)

And this produces the correct result (increment 2 and then multiply the result (3) by 5):

15

This is the simplest way of chaining steps, no artificial combining operators are used!

The user always uses a single, standard function and provides to it the sequence of functions (steps) that must be chained.

Absolutely no Mumbo jumbo ...

dnovatchev commented 1 year ago

As we introduce more and more built-in functions that expect functions as arguments, I believe it's helpful to have a syntax that's more concise than what we have now. Otherwise, there'll often be no motivation to use the new functions at all.

Nope.

No new syntax!

Just one needed standard function: fn:multiCompose ).

ChristianGruen commented 1 year ago

Just one needed standard function: fn:multiCompose .

@dnovatchev So how would fn:multiCompose need to be used to e.g. sort a sequence of emp elements by their salary attribute values?

michaelhkay commented 1 year ago

$multComp((op('*')(?,5), op('+')(?,1)), 2)

Absolutely no [Mumbo jumbo]

I fear there are some users who might have a different perspective.

Historically I think one of the reasons for the success of XPath has been that its functional semantics are hidden behind friendly syntax. One of the earliest examples was the humble path expression: we allow users to write doc/chapter/section rather than having to write children(children(children($root, "doc"), "chapter"), "section"). The arrow expression has been popular for similar reasons; it has clean functional semantics behind a friendly intuitive syntax.

You might find the above expression more readable than

2 => {. + 5} => {. + 1}

or perhaps even than 2 + 5 + 1

but I'm not sure how many users would agree with you.

dnovatchev commented 1 year ago

You might find the above expression more readable than

2 => {. + 5} => {. + 1}

or perhaps even than 2 + 5 + 1

but I'm not sure how many users would agree with you.

@michaelhkay,

Actually nothing as terrible as you describe. Just:

$multComp( $incr(1), $incr(5), 2 )

Isn't it obvious that incr(5) is simpler and much easier to understand than => {. + 5}

In your example you are trying to encourage users to have:

2 => {. + 5} => {. + 1}

And this is a bad advice, because whoever reads this has to strain his brain not once but twice in order to get it, that both subexpressions are actually the same operation (adding some value to the argument).

Even if we do allow lambda expressions, it would be good to advise to the user never to repeat the same or similar operations in more than one lambda expressions. The best practice here is to factor repeating functionality into a single function definition.

dnovatchev commented 1 year ago

Just one needed standard function: fn:multiCompose .

@dnovatchev So how would fn:multiCompose need to be used to e.g. sort a sequence of emp elements by their salary attribute values?

@ChristianGruen In this particular case there is just a single call to fn:sort, thus there is no need for chaining function calls.

ChristianGruen commented 1 year ago

@dnovatchev So how would fn:multiCompose need to be used to e.g. sort a sequence of emp elements by their salary attribute values?

@ChristianGruen In this particular case there is just a single call to fn:sort, thus there is no need for chaining function calls.

Thanks. As far as I understand it, the basic topic of this issue is to find a short syntax for anonymous functions with one Expressions such as…

filter($seq, function($item) { $item (: ... :) })
sort($seq, (), function($item) { $item (: ... :) })

…could be shortened to…

filter($seq, { . (: ... :) })
sort($seq, (), { . (: ... :) })

Chained function calls are just one case for which the new syntax could also be used, simply because the grammar will permit it.

dnovatchev commented 1 year ago

Chained function calls are just one case for which the new syntax could also be used, simply because the grammar will permit it.

Yes, and I was responding to @michaelhkay who wrote that one of the main reasons for having arrows was to provide the expressive capability for chaining:

Someone writing that isn't going to think of that step in the braces as a function, they are going to think of it as the next processing step to be performed in the sequence of tasks. We don't want it to look complicated or forbidding, and we don't want to force them to switch paradigm when the processing pipeline becomes just a little bit more complicated.

And as I showed, no arrows, or actually no connectors at all are needed in order to perform a sequence of actions, more precisely a sequence of functions, each of which is applied on the result of the previous function application.

ndw commented 1 year ago

I'm torn. On the one hand, I'm skeptical of all of the new syntax proposals because they add considerable complexity to the language in ways that professional programmers familiar with lambda expressions will find convenient but that I fear casual users and users who don't self-identify as programmers will find bewildering. On the other hand, the example that uses $multiComp is very clever, but is deep black magic. Fold left, fold right, partial function application, and recursive functions are powerful tools, but they're also very, very hard for many folks to understand. Not infrequently, in my experience, so hard that users give up before learning them.

I don't need xsl:iterate in XSLT because I'm perfectly comfortable writing tail-recursive functions. But boy do I appreciate the training wheels and guard rails that it provides which allow me to do it without thinking very hard.

Similarly, if we're going to add lambda functions so that we can conveniently write chained arrow expressions with anonymous code fragments in the middle, I think we'll reach a larger audience if we provide some syntactic forms that guide users towards expressions that work correctly and are, to some eyes at least, more readable than applying nested functions to partially applied functions.

The challenge is syntactic guides that don't look like line noise (an analogy that I realize only the olds among us are likely to have experienced in real life) to the uninitiated.

ChristianGruen commented 1 year ago

The challenge is syntactic guides that don't look like line noise (an analogy that I realize only the olds among us are likely to have experienced in real life) to the uninitiated.

It’s interesting to see how the compact notation of anonymous functions changed things in JavaScript: Whereas the older audience still tends to prefer to use the full keyword notation, it has become more obvious to younger developers to go for $x => X, as can be observed when looking at more recent JS projects.

Rust may serve as a good inspiration. It’s no language for novices, but I believe it may serve as a good inspiration (even more if we not just try to target beginners). It has been voted the most loved language for several years now. It also provides a compact syntax for HOF operations. For example, it gives you seq.map(|n| n * n):

My feeling is that the code history is more like a pendulum, moving from one extreme to the other (so it’s an analogy with Western history in general). Depending on when you started, you may find constructs more or less verbose. I believe it’s mainly the general background (natural science vs. humanities) that determines if you tend to express yourself concisely or verbosely (and both approaches have their full justification).

dnovatchev commented 1 year ago

I'm torn. On the one hand, I'm skeptical of all of the new syntax proposals because they add considerable complexity to the language in ways that professional programmers familiar with lambda expressions will find convenient but that I fear casual users and users who don't self-identify as programmers will find bewildering. On the other hand, the example that uses $multiComp is very clever, but is deep black magic. Fold left, fold right, partial function application, and recursive functions are powerful tools, but they're also very, very hard for many folks to understand. Not infrequently, in my experience, so hard that users give up before learning them.

@ndw Norm, I understand that you are concerned that with a standard function fn:multi-compose the users will need experience with functions such as fold-left, fold-right, etc.

Not in this case, because the implementation of the standard fn:multi-compose is of no concern to the users - they just call this function.

We can also spare them the need to make partial applications for at least some of the most common functions, by providing standard functions:

fn:incr($additive)

fn:times($factor)

fn:on-codition($predicate, $x, $y, $funOnTrue, $funOnFalse)
dnovatchev commented 1 year ago

On the other hand, the example that uses $multiComp is very clever, but is deep black magic.

@ndw There is this proverb:

"To iterate is human, to recurse is divine"

And in discussing this issue people thinking like you, I, Michael Sperberg-McQueen, are the "angels" here 😃

L Peter Deutsch Quote: “To iterate is human, to recurse divine.”
ChristianGruen commented 1 year ago

To iterate is human, to recurse is divine

To recurse or curse? ;)

dnovatchev commented 1 year ago

To iterate is human, to recurse is divine

To recurse or curse? ;)

@ChristianGruen Please, see the update to the previous comment: We are not the devils here 😃

benibela commented 1 year ago
* With XPath 3, it was `for-each($seq, function($n) { $n * $n })`

* With XPath 4, it will be `for-each($seq, $n -> { $n * $n})`

* With XPath 4, it could additionally be `for-each($seq, { . * . })`.

With XPath 2, it was $seq / (. * .) in many cases. It has been getting worse!

With XPath 4, it could additionally be for-each($seq, { . * . }).

Standalone { is the worst. It would block too many useful syntaxes an alternative to map{ or some scripting extensions.

One would need to investigate if more users use maps or use anonymous functions

(.) -> {.+1} (4) is logically consistent

Or \{.+1} could be shorter

ChristianGruen commented 1 year ago

With XPath 2, it was $seq / (. * .) in many cases. It has been getting worse!

True, a standalone for-each function call may not be the most convincing example to demonstrate the general usefulness of higher-order functions.

Standalone { is the worst. It would block too many useful syntaxes an alternative to map{ or some scripting extensions.

Yes, there’s something to that. – It would be interesting to know why the map keyword was introduced for map structures. Without it (as with JSONiq), many JSON data structures could have been pasted to XPath unchanged. Does anyone remember who has been involved in the decision-making if it was a grammar issue or something else? Is it documented somewhere online?

Here are again some variants that have been proposed in this issue and at other places:

(: currently available :)
sort($persons, key := function($item) { $person/@age })
sort($persons, key := $person -> { $person/@age })

(: proposed, with 'f'/'fn'/… possibly being synonyms for 'function' :)
sort($persons, key := fn($person) { $person/@age })
sort($persons, key := function { @age })
sort($persons, key := . -> { @age })
sort($persons, key := fn { @age })
sort($persons, key := f { @age })
sort($persons, key := λ { @age })
sort($persons, key := \{ @age })
sort($persons, key := .{ @age })
sort($persons, key := { @age })
michaelhkay commented 1 year ago

It would be interesting to know why the map keyword was introduced for map structures. Without it (as with JSONiq), many JSON data structures could have been pasted to XPath unchanged. Does anyone remember who has been involved in the decision-making if it was a grammar issue or something else? Is it documented somewhere online?

The JSONiq people wanted bare braces for maps, the scripting people wanted them for sequential blocks (ordered execution), and the judgement of Solomon was that if they couldn't agree, they wouldn't be used for either, in order to keep options open for the future.

dnovatchev commented 1 year ago
* With XPath 3, it was `for-each($seq, function($n) { $n * $n })`

* With XPath 4, it will be `for-each($seq, $n -> { $n * $n})`

* With XPath 4, it could additionally be `for-each($seq, { . * . })`.

With XPath 2, it was $seq / (. * .) in many cases. It has been getting worse!

With XPath 4, it could additionally be for-each($seq, { . * . }).

Standalone { is the worst. It would block too many useful syntaxes an alternative to map{ or some scripting extensions.

One would need to investigate if more users use maps or use anonymous functions

(.) -> {.+1} (4) is logically consistent

Or \{.+1} could be shorter

Thanks @benibela,

It seems that there is a significant group of people even in this CG that see the proposed abbreviated writing (I call it "scribblings") for what really it is: confusing, less readable and less maintainable.

And why do we have to invent for the (N+1)th rime something that already can be expressed in N known ways?

For me one of the worst effect of this is that we as a group are investing our time in this, when we could be doing better things.

ChristianGruen commented 1 year ago

For me one of the worst effect of this is that we as a group are investing our time in this, when we could be doing better things.

One good approach to set other priorities is to develop proposals and send pull requests for features your believe are more important.

michaelhkay commented 1 year ago

we could be doing better things.

Sorry, but I regard this as important.

XPath has always offered higher-order functional capability without requiring users to understand higher-order functions. It achieved this originally using custom operators such as a/b and a[b]. But you can't keep doing this, you run out of operators. We haven't been able to provide a similar range of operators for arrays, for example. XSLT provides custom syntax for operations such as sorting and merging, but that too runs out of steam eventually. To extend the functionality of the language without constant invention of new syntax, we need higher-order functions. But constructs like sorting should be as easy to use as filtering and mapping, and that can be achieved by using the same device of an implicit variable, ".", that does not need to be explicitly named and declared.

I believe that using the syntax sort(//employee, \{@salary}), or any of the other syntactic variations that have been proposed, is much more accessible to our typical users than writing sort(//employee, $e -> {$e/@salary}). Users don't have to understand that \{@salary} is defining a function, any more than they think of a/b or a[b] as defining a function; it's just a notation for declaring a sort key, and no deep conceptual understanding of its semantics is needed. It's also more readable, avoiding the clutter of an unnecessary variable $e that adds no value.

In addition, it plugs what is clearly a gap in the capability of arrow expressions. As identified in the XBow paper at XML Prague 2020 (Juri Leino), things currently get very clumsy when you have to put a user-defined function into an arrow pipeline such as

$x => substring-after('[') => substring-before(']') => (function($x){if (XXX) then upper-case($x) else lower-case($x)})()

Lambda expressions don't help that much:

$x => substring-after('[') => substring-before(']') => ($x -> {if (XXX) then upper-case($x) else lower-case($x)})()

in fact, the use of the arrow in the lambda expression arguably make it worse. How much simpler to allow a simple expression without extraneous variables:

$x => substring-after('[') => substring-before(']') => {if (XXX) then upper-case(.) else lower-case(.)}

Again, the key thing is that to the typical user, it's just an easily-learned way of chaining expressions together, it doesn't involve any understanding of advanced computer science.

dnovatchev commented 1 year ago

believe that using the syntax sort(//employee, \{@salary}), or any of the other syntactic variations that have been proposed, is much more accessible to our typical users than writing sort(//employee, $e -> {$e/@salary}). Users don't have to understand that \{@salary} is defining a function, any more than they think of a/b or a[b] as defining a function; it's just a notation for declaring a sort key, and no deep conceptual understanding of its semantics is needed. It's also more readable, avoiding the clutter of an unnecessary variable $e that adds no value.

sort(//employee, \{@salary}) is less readable than:

sort(//employee, $getSalary})

If you have in N places in the code:

sort(//employee, \{@salary}) , or even any expression containing @salary

and later the document schema is changed so that now an <employee> has a child-element <salary> (maybe multiple salaries), instead of an attribute, then the code must be updated in all N places.

This is much worse, time consuming and error-prone than having to do just a single update in the $getSalary function.

michaelhkay commented 1 year ago

This is much worse, time consuming and error-prone than having to do just a single update in the $getSalary function.

Users scatter path expressions like employee/@salary all over their code all the time without encapsulating this in a function. Are you seriously suggesting they are wrong to do so?

dnovatchev commented 1 year ago

This is much worse, time consuming and error-prone than having to do just a single update in the $getSalary function.

Users scatter path expressions like employee/@salary all over their code all the time without encapsulating this in a function. Are you seriously suggesting they are wrong to do so?

It is one thing to have a separate path expression, and totally different to have a mixture of function-calls and their arguments being path expressions. The latter in most cases is not the clearest and most understandable way to express a function call, especially if the argument(s) is/are complex path expressions or even a mixture of path expressions and other function calls.

dnovatchev commented 1 year ago

This is much worse, time consuming and error-prone than having to do just a single update in the $getSalary function.

Users scatter path expressions like employee/@salary all over their code all the time without encapsulating this in a function. Are you seriously suggesting they are wrong to do so?

It is one thing to have a separate path expression, and totally different to have a mixture of function-calls and their arguments being path expressions. The latter in most cases is not the clearest and most understandable way to express a function call, especially if the argument(s) is/are complex path expressions or even a mixture of path expressions and other function calls.

And even when we are writing separate (unmixed) path expressions, it is always a good practice to factor the common leading path into a variable with meaningful name.

Not only this might help optimize the evaluation, but even more importantly, this brings meaning and at the same time shortens all path expression having this same common leftmost-path.

michaelhkay commented 1 year ago

I think it is perfectly reasonable to write

<xsl:for-each select="employee">
  <xsl:sort select="@salary"/>
  ...
</xsl:for-each>

and I think it is equally reasonable to write

sort(employee, \{@salary})

I simply don't accept your argument that using an inline expression/function here is inappropriate.

dnovatchev commented 1 year ago

I think it is perfectly reasonable to write

<xsl:for-each select="employee">
  <xsl:sort select="@salary"/>
  ...
</xsl:for-each>

and I think it is equally reasonable to write

sort(employee, \{@salary})

I simply don't accept your argument that using an inline expression/function here is inappropriate.

What I said was that if we use (any) expression more than once, it is reasonable to factor it into a function, or, if it is a constant, to assign it to a variable that we then reuse in all such expressions.

Thus having each of the above expressions, by itself is not problematic, but having many expressions that have repeated/common subexpressions may be improved (both in readability and time-complexity) by factoring the common sub-expression into a variable or function with meaningful name.

The variable will be assigned (evaluated) only once and then reused N times. Also, the user will know exactly what is the meaning of this variable. The function will concentrate in a single place the code that calculates the factored subexpression, will be the single, sole, central thing that needs to be updated in case the subexpression is changed to something else, and again, will provide a short and meaningful name for the subexpression.

ChristianGruen commented 1 year ago

it is always a good practice to factor the common leading path into a variable with meaningful name.

So my impression is that this discussion is not about lamba functions; it's about general code style and conventions. You can always motivate users to do everything only once and bind repeatedly used code to variables or functions. No matter what syntax we offer, we cannot prevent people from doing all kinds of things that we would possibly not do by ourselves.

Personally, I frequently bind anonymous functions to variables (with XQuery as well with other languages). This doesn't change anything about the belief that a compact syntax will be beneficial.

dnovatchev commented 1 year ago

Personally, I frequently bind anonymous functions to variables (with XQuery as well with other languages). This doesn't change anything about the belief that a compact syntax will be beneficial.

This is not about a "belief".

This is about readability, understandability and maintainability.

Using a unique (not repeated, used only once) expression within a lambda expression probably does not directly affect maintainability, as there are no bad effects due to redundancy, but it still may affect readability and understandability, and thus, maybe more indirectly, maintainability.

All these three properties reflect complexity. While throwing in the code occasional lambda expression may feel empowering, the fact is that users and significant portion of people even in this group voice concerns about the understandability of such code.

In a few months even the author of a "slick" lambda expression may have trouble understanding what he himself meant with this expression.

If we are writing an insignificant, throwaway piece of code, then nobody cares. But when programming in the large, the code often has to be maintained for decades, and then one (or probably the sole) quality people (who come to maintain the product after the main heroes have moved or retired), value most is its maintainability, which is a reflection of the readability and understandability of the code.

Let us not be proud that we shortened an expression with 3 characters and made its readability rather problematic to even our future selves.

Instead, let us feel proud when we have written code that even someone, not acquainted with, grasps naturally without problems.

ChristianGruen commented 1 year ago

This is not about a "belief". This is about readability, understandability and maintainability.

If you believe it’s not about belief, you will be left to prove it, and you may need to prove that it has been a wrong decision to introduce this syntax in so many other modern languages that are much more popular than XPath.

I still cannot see why the arguments you provide would not apply to existing XPath 1.0, such as //x[.>3][.<6], which only looks readable to us (and thus understandable and maintainable) because we’ve become used to it. It could also be written as something like…

let $cmp := function($op, $n) { op($op)(?, $n }
return filter(filter(//x, $cmp('>', 3)), $cmp('<', 6))

…but who would ever do that?

If either $x[ ... ] or $x -> { ... } does not feel convincing, it's always possible to use other constructs, or bind parts of it to variables whenever you feel like it. Next, again, you can bind any repeated construct to variables, it simply does not matter if it’s predicates or inline functions of whatever syntax.

dnovatchev commented 1 year ago

still cannot see why the arguments you provide would not apply to existing XPath 1.0, such as //x[.>3][.<6], which only looks readable to us (and thus understandable and maintainable) because we’ve become used to it

@ChristianGruen

We have been taught polynomial factorization in school math, and this is one example of a common and really foundational principle. Whenever there is a repetition and redundancy, capturing the repeating part into a separate component results in a breakdown of the initial expression into an equivalent expression that contains a number of simpler and not redundant subexpressions. In the case of identifying and removing redundancy from expressions in code, one can capture the common subexpressions in variables with meaningful names, which aides understandability.

This is a common principle that applies to complex expressions of any kind.

One XPath example may be the following. We have two or more expressions all starting with the same path:

(/a/b/c,

/a/b/c/*[. gt 5],

/a/b/c/*[. instance of xs:decimal],

/a/b/c/*[*])

it is natural to have:

let $myPath ;= /a/b/c  (: or any meaningful name :)
    return
      (
          $myPath, $myPath/*[. gt 5], $myPath/*[. instance of xs:decimal],  $myPath/*[*]
       )

Not only is this more readable and easier to understand, but if the document schema changes in the future (say an element-child <d> is added to <c> and now the common path is /a/b/c/d, then only one code change is necessary - in the definition of $myPath, and not 4 changes, as the unfactored code requires.

Again, the point here is not that people know how to write complicated code of any kind. The point is how to write code that is simplified and factorized, has meanngful variable names, no or minimal redundancy and that is safer (less error-prone) and easier (faster) to change and maintain.

ChristianGruen commented 1 year ago

The point is how to write code that is simplified and factorized, has meanngful variable names, no or minimal redundancy and that is safer (less error-prone) and easier (faster) to change and maintain.

All I can read out here is that this also applies to existing XPath 1.0 (see above), and that you would like to prevent users from creating redundancy. Which can be fine. Maybe it’s better to create a new issue for that and provide a suggestion for a mode (e.g. via a pragma, or anything else) that disallows the same construct more than once in a code?

ChristianGruen commented 1 year ago

It’s a pity. I had yet another look at the original solutions we started from…

$data -> { ./whatever }
let $f := { ./whatever  } return $f(...)

…and this is what we might get now:

$data =!> \{ ./whatever }()
let $f := \{ ./whatever  } return $f(...)

Please don’t be scared: I’m NOT trying to get us back to the original version again – we had good reasons to revise it, mostly the ambiguity of the thin arrow. Instead, I’m just observing that the syntax is getting more and more twisted, and I can understand that there’s growing resistance to where the next steps will lead us.

I believe we should not introduce new special characters that will make the syntax more noisy. But I agree with Michael that a focus function would be a valuable and easy-to-understand addition. The least controversial choice would be to stick with the existing function keyword. We already have map {}, array {}, and function {} looks intuitive enough to me to introduce it to bind the context item/value. This would give us:

sort($persons, (), function($person) { $person/@age })
sort($persons, (), function { @age })

Personally, I’d still be in favor of having function and fn as synonyms. Then we could also write…

sort($person, (), fn { @age })

…as Michael suggested in the initial comment.

dnovatchev commented 1 year ago

This is much worse, time consuming and error-prone than having to do just a single update in the $getSalary function.

Users scatter path expressions like employee/@salary all over their code all the time without encapsulating this in a function. Are you seriously suggesting they are wrong to do so?

It is one thing to have a separate path expression, and totally different to have a mixture of function-calls and their arguments being path expressions. The latter in most cases is not the clearest and most understandable way to express a function call, especially if the argument(s) is/are complex path expressions or even a mixture of path expressions and other function calls.

In addition to the picture from @benibela :

Fantasia(Disney, 1940) https://www.youtube.com/watch?v=oPDSoFgivPA (Clip from the third segment "The Sorcerer's Apprentice", music by Paul Dukas. Distributed by RKO Pictures and The Walt Disney Company)

image

dnovatchev commented 1 year ago

The point is how to write code that is simplified and factorized, has meanngful variable names, no or minimal redundancy and that is safer (less error-prone) and easier (faster) to change and maintain.

All I can read out here is that this also applies to existing XPath 1.0 (see above), and that you would like to prevent users from creating redundancy.

Not "to prevent", but not to provide them with ample possibilities to do so.

And right, this is a very common principle applying almost to anything - just common-sense.

Which can be fine. Maybe it’s better to create a new issue for that and provide a suggestion for a mode (e.g. via a pragma, or anything else) that disallows the same construct more than once in a code?

Probably having a tool, such as issues-checker, that will flag/warn certain text as potentially containing a known flaw, warn the user and suggest an alternative, best practice?

michaelhkay commented 1 year ago

@ChristianGruen wrote:

I believe we should not introduce new special characters that will make the syntax more noisy. But I agree with Michael that a focus function would be a valuable and easy-to-understand addition. The least controversial choice would be to stick with the existing function keyword. We already have map {}, array {}, and function {} looks intuitive enough to me to introduce it to bind the context item/value. This would give us:

sort($persons, (), function($person) { $person/@age }) sort($persons, (), function { @age })

I've tried a few examples using that syntax and I'm comfortable with it.

In an arrow pipeline this gives us

$x => substring-after('[') => substring-before(']') => function{if (XXX) then upper-case(.) else lower-case(.)}()

michaelhkay commented 1 year ago

PR #524 has resolved the issue.