with expression; chaining and concatenation

ChristianGruen commented 1 year ago

Outdated: See remaining discussion.

We have no expression yet to bind a value to the context value. Such an expression would be useful, among other things, to extend the focus function to sequences (fn { . }, see #129).

Here are 3 possible constructs for that, ordered by my personal preference:

1. Value Map Expression

ValueExpr      ::=  ValidateExpr | ExtensionExpr | ValueMapExpr
ValueMapExpr   ::=  SimpleMapExpr ("~" SimpleMapExpr)*
SimpleMapExpr  ::=  PathExpr ("!" PathExpr)*

(: Example :)
//flower ~ (count(.) || ' flowers: ' || string-join(name, ', '))

The expression would be similar to the simple map expression (which we could rename to item map expression). The following equivalents would then exist for simple FLWOR expressions:

for $i in (1 to 5) return string($i)  ≍  (1 to 5) ! string(.)
let $i := (1 to 5) return count($i)   ≍  (1 to 5) ~ count(.)

fn { E } could be rewritten to fn($c) { $c ~ E }.

2. Context Value Declaration

ContextExpr  ::=  "context" "{" Expr "}" EnclosedExpr

(: Example :)
context { //flower } {
  count(.) || ' flowers: ' || string-join(name, ', ')
}

The result of the first expression defines the context value, the second expression can reference the context.

fn { E } could be rewritten to fn($c) { context { $c } { E } }.

3. Enhanced FLWOR expression (for the sake of completion)

Similar to variables, the dot could be used to bind and reference the context:

LetBinding  ::=  ("." | ("$" VarName)) TypeDeclaration? ":=" ExprSingle
ForBinding  ::=  ("." | ("$" VarName)) TypeDeclaration? AllowingEmpty? PositionalVar? "in" ExprSingle

(: Example :)
let . := //flower
return count(.) || ' flowers: ' || string-join(name, ', ')

fn { E } could be rewritten to fn($c) { let . := $c return E }.

Assessment

The first solution looks most appealing to me. I like the analogy with the existing syntax for single items.
We could choose the second solution if we believe that the expression will be rarely used.
I‘ve backed away from the third solution; I think it would be too pervasive.

Arithmeticus commented 1 year ago

I don't fully understand this proposal. Can we see an example where the proposed ~ does something that ! wouldn't (or where ! is unavailable)?

ChristianGruen commented 1 year ago

I don't fully understand this proposal. Can we see an example where the proposed ~ does something that ! wouldn't (or where ! is unavailable)?

Sure. The following expression…

(0 to 4) ! count(.)

…will give you 1 1 1 1 1 as each item will be bound to the context one by one. The proposed ~ syntax…

(0 to 4) ~ count(.)

…will give you 5, as the full sequence is bound to the context once. With for and !, you can bind single items, with let and ~, you can bind sequences.

Hope this helps?

michaelhkay commented 1 year ago

I prefer (2) over (1), mainly because the scope of the focus change is very clearly marked. (The operator priority of "~" is not intuitive). With syntax like (2), I would consider other keywords, for example "with" or "within" or "using" rather than "context".

I would also be comfortable with (3). It seems more likely to be understood by someone who encounters the code and hasn't seen the construct before.

ChristianGruen commented 1 year ago

I think (1) could be a handy replacement for FLWOR expressions with single let clauses; but since we already have X => F(), and since the proposal was mostly born out of the need to have a syntax for focus functions, it may be overkill.

For (2), I’ll gladly choose another keyword. with may conflict with the XPath keyword: https://qt4cg.org/specifications/xquery-40/xpath-40.html#with-expressions (I don’t remember if we have formally accepted it).

ChristianGruen commented 1 year ago

The more I play around with the alternatives, the more I am convinced that the ~ operator could be a powerful complement to the simple map operator.

We’ve recently introduced the mapping arrow operator, =!>. Arrow operators work well for simple functions where the input is bound to the first argument. It gets tricky and cryptic in other cases where the expression needs to be wrapped into a function or represented as a partial function:

(: writes the tokens ('the', '?', 'sat', 'on', 'the', 'mat') to a file on disk :)
'the cat sat on the mat'
=!> tokenize()
=!> fn { if (. = 'cat') then '?' else . }()
=> (file:write-text-lines('tokens.txt', ?))()

If we had two map operators – the existing one for items and a new one for values – the overall syntax would become simpler and more flexible:

'the cat sat on the mat'
! tokenize(.)
! (if (. = 'cat') then '?' else .)
~ file:write-text-lines('tokens.txt', .)

Of course, you can always argue that the classic FLWOR expression provides everything you need. On the other hand, the simple map operator has become quite popular, while people could have written iterations with for.

ChristianGruen commented 1 year ago

Because I have campaigned a lot for =!>, I should be careful with my suggestion, but it could be an option to drop the mapping arrow operator and introduce and advertise ~ as a solution that’s much more generic.

One could argue about whether the appearance of ~ is self-explanatory enough. – It’s somewhat surprising to me that (as far as I can judge) ! was accepted without much dissent among users. I wonder if it has been its similarity to a slash (it could also be interpreted as a dot with a slash over it).

michaelhkay commented 1 year ago

The main reason we didn't have a simple mapping operator in XPath 2.0 was that we couldn't get agreement on a symbol that should be used. That's also why we ended up with the horrible compromise of allowing atomic values on the RHS of "/". In 3.0 I suspect the process was (a) get everyone to accept that the operator is needed, (b) draw up a shortlist and see what symbol is least objectionable. My own preference was \.

I'm certainly not comfortable with your proposed "~", but one can get used to anything over time. The UNIX pipe operator "|", which has similar semantics, is hardly intuitive either. One could argue for semicolon.

ChristianGruen commented 12 months ago

One could argue for semicolon.

I would assume it’s mentally reserved by most as the delimiter of a statement, or (linguistically) a thought (and it would conflict with the XQSE).

Another syntactic suggestion I received was !!, inspired by / and // (and possibly ? and ??):

(1 to 5) !! count(.)

'the cat sat on the mat'
!! tokenize(.)
! (if (. = 'cat') then '?' else .)
!! file:write-text-lines('tokens.txt', .)

Another suggestion was to use a syntax similar to the chainable update keyword in BaseX:

(1 to 5) bind { count(.) }

'the cat sat on the mat' bind {
  tokenize(.) ! (if (. = 'cat') then '?' else .)
} bind {
  file:write-text-lines('tokens.txt', .)
}

With the last approach, fn { E } could be rewritten to fn($c) { $c bind { E } }.

michaelhkay commented 12 months ago

I would strongly prefer something that puts the expression whose focus is being defined inside curly braces, so it's completely clear where the effect ends. Something like

with (parse-json('data.json')) { my:process(??sales-orders) }

michaelhkay commented 12 months ago

I have proposed elsewhere (issue #700) using !! as an array mapping operator:

["All mimsy were the borogoves",
 "And the mome raths outgrabe"] !! tokenize(.)

returns

[("All",  "mimsy", "were", "the", "borogoves"),
 ("And", "the", "mome", "raths", "outgrabe")]

and $a !! string-join(.) does the reverse.

There's a relationship in that with ([X]) {Y} has very similar effect to `[X]!!Y

ChristianGruen commented 12 months ago

I would strongly prefer something that puts the expression whose focus is being defined inside curly braces, so it's completely clear where the effect ends. Something like

with (parse-json('data.json')) { my:process(??sales-orders) }

That could be fine for single bindings. I have concerns that it gets bulky and difficult to decipher if context bindings are chained…

let $a := A
let $b := B (: referencing $a :)
let $c := C (: referencing $b :)
return D (: referencing $c :)

with (with (with (A) { B }){ C }) { D }
(: vs :) A bind { B } bind { C } bind { D }
(: vs :) A map { B } map { C } map { D }
(: vs :) A !! B !! C !! D
(: vs :) A ~ B ~ C ~ D

…and it’s very seems natural to create chains once you have such an operator. I just have to think of those notorious code snippets where people use variable shadowing for chaining operations:

let $a := ...$a...
let $a := ...$a...
let $a := ...$a...
return $a

michaelhkay commented 12 months ago

it’s very seems natural to create chains once you have such an operator.

For XSLT I've been thinking along the lines

<xsl:pipeline>
  <xsl:step>...</xsl:step>
  <xsl:step>...</xsl:step>
  <xsl:step>...</xsl:step>
</xsl:pipeline>

where each step takes its context from the result of the previous step, and the result of the pipeline is the result of the last step. (Don't like the mixed metaphor of a pipeline having steps, but that's another matter...)

michaelhkay commented 12 months ago

As a pipe operator, perhaps ~>. But the different flavours of arrow become very confusing.

ChristianGruen commented 12 months ago

As a pipe operator, perhaps ~>. But the different flavours of arrow become very confusing.

Yes, I believe we shouldn’t mix this up with the existing arrow operator, the semantics are too different. – If we were to create a new language, my favorite syntax would be:

for $e in EXPR return $e  ≍  EXPR -> .
let $e := EXPR return $e  ≍  EXPR => .

ChristianGruen commented 10 months ago

If we choose variant 2 and the keyword with (or steps, pipeline, context), we could allow chains of enclosed expressions (pipeline { CONTEXT } { EXPR1 } { EXPR2 }). An example:

(: writes the tokens ('the', '?', 'sat', 'on', 'the', 'mat') to a file on disk :)
pipeline {
  'the cat sat on the mat'
} {
  tokenize(.) ! (if (. = 'cat') then '?' else .)
} {
  file:write-text-lines('tokens.txt', .)
}

fn { EXPR } would be equivalent to fn($context) { pipeline { $context } { EXPR } }.

ChristianGruen commented 10 months ago

Another candidate could be focus, in alignment with “focus functions”.

ChristianGruen commented 9 months ago

@michaelhkay Would it be feasible to rename “focus functions” to “context functions”? We could then use the keyword context for binding the context value (which I believe would be more familiar than “focus”):

fn { fn:process(.) }
(: would then be represented as :)
context { $input } { fn:process(.) }

(: and we should allow chains :)
context { $data } { subsequence(., 1, 5) } { count(.) }

I’ve seen that “Context Functions” is already used in the XQFO spec, but I assume we could name them “Functions Accessing the Context”.

michaelhkay commented 7 months ago

I'm inclined towards allowing

let . := EXPR return EXPR
for . in EXPR return EXPR

primarily because the meaning is likely to be fairly obvious to the reader, and it doesn't introduce any new special symbols.

However, it's easier to define in XPath than in XQuery, because in XQuery it messes up all the language about "tuples of variable bindings".

Whatever syntax we choose, there are complexities about the binding of position() and last(). Consider:

for member . in [(1,2), (3,4)]
return `{position()} of {last()} : {.}`

ChristianGruen commented 7 months ago

Whatever syntax we choose, there are complexities about the binding of position() and last(). Consider:
for member . in [(1,2), (3,4)]
return `{position()} of {last()} : {.}` 

I agree with the concerns. In our internal feedback loops, we’ve already discarded this variant: FLWOR expression stretch over multiple lines, and it’s often difficult to decipher in a complex FLWOR expression where the context item was actually defined. Next, it complicates debugging, as queries such as the following one would suddenly be parsed and evaluated without errors:

for . in $doc/addressbook/person
let $emails := count(email)
return emails

michaelhkay commented 7 months ago

No one seems to have mentioned that the proposed X ~ Y can currently be written X => fn{Y}().

I wonder if this is good enough to meet the requirement?

In the past we've talked about allowing other things on the RHS of =>. One possibility that's been raised is X => {Y}, but now that we use bare curlies for a map constructor, that's probably no longer a sensible choice.

A variant might be X => apply{Y}.

This seems to fit in well the idea of using this construct to build pipelines where we think of each step as being a transformation applied to the result of the previous step.

X => apply{Y} is actually one character longer than X => fn{Y}() but it feels conceptually much simpler, and unlike X ~ Y I think its meaning is fairly easily guessable by someone who hasn't encountered it before.

ChristianGruen commented 7 months ago

No one seems to have mentioned that the proposed X ~ Y can currently be written X => fn{Y}().

One motivation for proposing $x ~ Y was precisely the other way round, i.e., to have a construct that allows us to formulate fn { Y } as fn($x) { $x ~ Y } (#131). With the new wording in the spec, it’s left to the implementor how the argument is bound to the context value – which is fine, as the underlying solution could certainly be implementation-specific.

I wonder if this is good enough to meet the requirement?

Instead of (1, 2, 3) ! (. + 1), one could write (1, 2, 3) =!> fn { . + 1 }(), but I think the first one is much simpler and easier to read…

A variant might be X => apply{Y}.

Maybe this?

(: fn { Y }(X) → apply { X } to { Y } :)
apply { EXPR } to { EXPR2 } to { EXPR3 } to { file:write('result.txt', .) }

…vs…

EXPR ~ EXPR2 ~ EXPR3 ~ file:write('result.txt', .)

EXPR => apply { EXPR2 } => apply { EXPR3 } => apply { file:write('result.txt', .) }

let $data := EXPR
let $data := EXPR2
let $data := EXPR3
return file:write('result.txt', $data)

michaelhkay commented 3 months ago

How about going for

using { expr1 } { expr2 }

?

ChristianGruen commented 3 months ago

I can’t help: My preference would still be context from the first comment, since that’s what the expression is about: Mapping the result of an expression to the context. What would speak against it?

context { EXPR1 } { EXPR2 } { EXPR3 } { ... }

And as mentioned earlier, we could rename “focus functions” to “context functions”. This would reduce the amount of different terms that we use for similar things. Personally, I haven’t used the term “focus” for context bindings yet, but if it’s common enough, we could also replace context by focus.

michaelhkay commented 3 months ago

I think if we had a construct called context then I would expect it to be able to control many aspects of the static and dynamic context (for example, namespace bindings), not just one aspect.

ChristianGruen commented 2 months ago

In our internal tests and feedback rounds, a keyword syntax does not win many supporters. It is pretty verbose and adds no real advantage to let/return constructs:

let $input := 1 to 5
let $string := `Results ({ count($input) }): { $input }`
return upper-case($string)

focus { 1 to 5 } {
  `Results ({ count(.) }): { . }`
} {
  upper-case(.)
}

Next, the additional braces are often confusing.

The operator variant gains more traction:

(1 to 5)
~ `Results ({ count(.) }): { . }`
~ upper-case(.)

=.> was proposed as an alternative, as we already have =!> and =.>:

(1 to 5)
=.> `Results({ count(.) }): { . }`
=.> upper-case(.)

$data
=.> (if(exists(.)) then prepare(.) else error((), 'Input is empty'))
=.> enrich(.)
=.> file:write('output.txt', .)

As for naming, my suggestion of “value map expression” could be confusing to those who associate mappings with one-to-one transformations. Maybe “focus expression” would be a better choice (in alignment with focus functions):

ValueExpr      ::=  ValidateExpr | ExtensionExpr | FocusExpr
FocusExpr      ::=  SimpleMapExpr ("=.>" SimpleMapExpr)*
SimpleMapExpr  ::=  PathExpr ("!" PathExpr)*

michaelhkay commented 2 months ago

I think the current position is pretty much as Christian states it in the original issue. I think that the functionality is needed, and there are three possible solutions offered. My analysis of the pros and cons for each of the solutions is as follows. In each case the effect is to evaluate E2 with the context value set to the result of E1 (and probably with position = last = 1):

Value Map Expression E1 ~ E2
Keyword syntax, for examplecontext{E1}{E2} (with other keywords suggested)
Generalised let expression: let . := E1 return E2

Option 1: this is simple and concise. The disadvantages are (a) it uses one of the few remaining ASCII punctuation symbols available; (b) the semantics are unrelated to any of the traditional semantics of tilde in programming or mathematics (usually indicating matching, approximate equality, or logical negation); (c) it's far from obvious what the right operator precedence should be, and this is likely to be a common source of errors especially as E2 might often be a rather lengthy expression.

Option 2: given the right keyword, this construct can be made readable and self-explanatory, and the scope of the two expressions is unambiguous without any reliance on operator precedence. However, it seems to be tricky to find a keyword (or even a pair of keywords K1 {E1} K2 {E2}) that intuitively conveys the meaning while remaining concise.

Option 3: like the existing FLWOR expression, this is a bit verbose, and it's not always obvious where the return expression E2 ends. But it has the benefit that a user coming across the construct for the first time is likely to guess correctly what it means, and the precedence rules are the same as the existing construct. It also extends naturally to variant forms, such as: for member . in $array return .... , which would otherwise require their own custom syntax (we rejected a proposal to provide a simple array mapping operator, because we felt we were introducing too many new operators).

My preference is option 3.

ChristianGruen commented 2 months ago

The clear disadvantage for 3) that we currently see is that it cannot serve as a shortcut for existing let expressions. Next, we observed that it’s difficult to understand in FLWOR expressions what a dot refers to (we had already implemented this experimentally).

In the last comment, I have proposed =.> as new alternative. We increasingly like it, and it seems a good response to:

(a) it uses one of the few remaining ASCII punctuation symbols available; (b) the semantics are unrelated to any of the traditional semantics of tilde in programming or mathematics (usually indicating matching, approximate equality, or logical negation)

The tilde remains untouched.

(c) it's far from obvious what the right operator precedence should be

The same observation applies to existing operators such as => vs. =!>, or / vs. !, and I would l claim that the only solution that seems both reasonable and intuitive can be to prioritize the mapping of single items. The 3-character length may be helpful, too, to derive the precedence (and of course one can always use parens).

This feature request certainly belongs to one of the most important ones for us. We are convinced that many users will benefit from its introduction as such an operator has been repeatedly discussed by us in the past. I would like to create a PR for it next week, and I will be happy to present a proof-of-concept live demo.

dnovatchev commented 2 months ago

In the last comment, I have proposed =.> as new alternative. We increasingly like it, and it seems a good response to:

One more unreadability!!!!

Definitely not good.

Why?

Totally unreadable. The dot is almost invisible - at least for the 50%+ readers who wear glasses.
3 special symbols in a row...
Extremely error-prone - missing to hit the dot results in => which is another operator of the language. Missing to hit the = results in .> which I think is a valid LHS of an XPath expression. Not pressing the Shift key can result in =.. which I think is a valid RHS of an XPath expression. Accidentally inserting a space within may also be a problem.
Completely suppresses understandability. If we measure the difficulty of understanding a single special symbol as Q, then a 2-symbol operator has difficulty of understanding Q^2 and a 3-symbol operator has difficulty of understanding Q^3. Some similar metric applies to the probability of a typing error.

Combine all this with the fact that we already have quite a lot of 3-symbol-operators and all these together could be part of an expression... Yes, I definitely would pity the reader of such expressions - even the person who themselves wrote it "just a few months ago".

ChristianGruen commented 2 months ago

One more unreadability!!!!

What about !!!!? ;)

missing to hit the dot results in => which is another operator of the language.

There is a deliberate parallel to =>. With the existing arrow operator, you cannot decide to which parameter your LHS is bound. =.> would you allow exactly that:

EXPR =>  $action('second-arg-only')
EXPR =.> $action(., 'second-arg')
EXPR =.> $action('first-arg', .)

a 3-symbol operator has difficulty of understanding Q^3

I am very convinced (much more than a year ago) that the introduction of the proposed operator will be very powerful, no matter which characters we eventually use for it. I would not have ventured this if we had not recently introduced =?> and =!>, and I would not mind having a convincing shorter operator. Suggestions are welcome.

dnovatchev commented 2 months ago

I don't fully understand this proposal. Can we see an example where the proposed ~ does something that ! wouldn't (or where ! is unavailable)?

I don't too... 😢

At least feeling relieved I am not the only one

dnovatchev commented 2 months ago

The following expression…

(0 to 4) ! count(.)
…will give you 1 1 1 1 1 as each item will be bound to the context one by one. The proposed ~ syntax…
(0 to 4) ~ count(.)
…will give you 5, as the full sequence is bound to the context once. With for and !, you can bind single items, with let and ~, you can bind sequences.

Why is this needed at all?

This can be expressed in a more clean manner simply as:

(0 to 4) => count()

ChristianGruen commented 2 months ago

Before reacting with rejections, I would appreciate if you could spend some more time on reading the comments in this conversation or wait for my little demo in the next meeting. I don’t know if I will succeed, but I will do my best to try to explain why we (BaseX) believe the operator should become part of 4.0.

dnovatchev commented 2 months ago

My preference is option 3.

I fully agree with @michaelhkay .

But I feel uneasy about giving one more meaning to the already too-heavily overloaded dot character (.) .

It would be better to have a special variable convention - something as: $$Context

dnovatchev commented 2 months ago

Before reacting with rejections, I would appreciate if you could spend some more time on reading the comments in this conversation or wait for my little demo in the next meeting. I don’t know if I will succeed, but I will do my best to try to explain why we (BaseX) believe the operator should become part of 4.0.

This is not rejection but a spontaneous reaction - rather spoilt the pleasure of having my morning coffee.

Is it accidental that probably no programming language has operators with 4 or more symbols?

No, because one of the primary reasons operators are used is for brevity. Long strings don't fulfil this purpose - on the contrary - they act as successful obfuscation/encryption of a very meaningful keyword that can be used instead.

we (BaseX) believe the operator should become part of 4.0.

We are working on "XPath 4.0" - not on "BaseX XPath 4.0"

I very much appreciate the fact that @michaelhkay doesn't start an argument with: "We at Saxon" (though we-at-Saxon would probably outweigh by many factors we-at-basex) because this could easily escalate into a fans-war - something that I believe we should try to avoid at any cost.

ChristianGruen commented 2 months ago

Is it accidental that probably no programming language has operators with 4 or more symbols?

Dimitre, please do read my responses. I mentioned I would prefer to have a shorter operator, and I am open for suggestions.

we-at-Saxon would probably outweigh by many factors we-at-basex

This may very well be.

I didn’t want to stop you enjoying your morning coffee. Let’s stop this dead-end discussion, it has already become personal and confrontative again.

dnovatchev commented 2 months ago

Am I alone in feeling that we have already crossed a line in adding more and more operators into the language:

operators that are cryptic
that require significant mental effort to decrypt and assimilate
operators with error-prone characters?
operators that might better be expressed as similar-length meaningful keywords?

And if several such operators are used in a single XPath expression, then what is the end result?

dnovatchev commented 2 months ago

Let’s stop this dead-end discussion, it has already become personal and confrontative again.

This is in no way a "dead-end" discussion.

In fact, I am expressing my opinion and trying to convince the people that are still not-fully convinced that cryptography and obfuscation is the opposite of readability and understandability.

Why on one side, someone asks for feedback after posting 16 comments on this issue, but tells people who don't support his favored position to stop, after seeing this expressed in only 4 comments?

As for being "personal" - well, I re-read the 4 comments and don't think discussing operators vs. keywords has anything "personal" (at least for me) in it.

As for "confrontative" - well it was not in my comments where "we-at basex" (obviously vs. "we-at-elsewhere-else") was used as part of an argument...

michaelhkay commented 2 months ago

Am I alone in feeling that we have already crossed a line in adding more and more operators into the language:

It's a legitimate concern. But on my reckoning we currently have 22 operators written using punctuation symbols, compared with 42 in Javascript, so we're not obviously out of line.

liamquin commented 2 months ago

overall i'd like fewer operators, not more. Javascript started out with the 20 or so C operators and then added things like . for function application, ?. for conditional function application, === for "is", and so on.

But JS is massively more widely used than XQuery, XSLT, XPath these days.

As a result, we have to optimize the language for intermediate users, not for experts.

It's much easier for people sacred of specs to find out what "with context as" does than to search for =.~!> or whatever. I know i can no longer remember the difference between the qt4 arrow variants, and can't easily find a list of them all.

(1, 2, 3, 4) ! fn { . * 3) as . in sum(.)

(1 2 3 4 ) ! fn { . * 3 } as $all in sum($all)

might work better.

In the course i just gave on XSLT i emphasized, XML and XSLT empower people who do not think of themselves as programmers to do advanced text processing.

dnovatchev commented 2 months ago

Am I alone in feeling that we have already crossed a line in adding more and more operators into the language:

It's a legitimate concern. But on my reckoning we currently have 22 operators written using punctuation symbols, compared with 42 in Javascript, so we're not obviously out of line.

In Javascript there seem to be 43 operators, as defined in https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Expressions_and_operators

Interestingly, I tried to count the number of operators in XPath and this was very difficult due to:

We do not have operators defined at all in the XPath specification
Operators are defined in the F&O specification, but not all of them. Missing are such operators as:

or
and
|
/
//
.
..
','
||
not()

Depending on whether or not we consider the above to be operators we can arrive at different total number of operators in the language.

Maybe it would be a good idea to have a specic "Operators" section in XPath where all operators are explicitly listed?

In Javascript, the number of 3+ character-operators is only 10.

Thus, the very-long operators in Javascript are about 23% of all operators.

I wouldn't be surprised if this percentage is already higher in XPath 4.0.

dnovatchev commented 2 months ago

It's much easier for people sacred of specs to find out what "with context as" does than to search for =.~!> or whatever. I know i can no longer remember the difference between the qt4 arrow variants, and can't easily find a list of them all.

+100 💯

You are not alone, Liam.

Let us have a separate section of the "XPath 4.0" document where all operators are explicitly listed.

We might need a special tool that reads XPath expressions containing any such arrow variants and translates these to something understandable for the reader.

But expressing a need for such a tool makes all of us "users with accessibility needs" - this is the result of having so many and so long, so cryptic operators - we are all put in the "disability group".

ChristianGruen commented 2 months ago

With regard to 3-character operators, maybe we can get rid of =!> again? And do we really need =?>?

In the following, I am using ▶ as neutral placeholder (no proposal!) for the value map operator.

Instead of => and =!>, we could use ▶ and ! to build chains in which both sequences and items need to be bound…

(: currently supported syntax :)
EXPR
=> a() =!> b()
=> c()
(: equivalent syntax :)
EXPR
▶ a(.) ! b(.)
▶ c(.)

=> and =!> are both limited, as they can only be used to chain functions. With ! and ▶, arbitrary expressions could be chained:

(1 to 10) ! (. * 2)
▶ `Count: { count(.) }. Sum: { sum(.) }`

The equivalent FLWOR expression is:


let $numbers := (
  for $n in 1 to 10
  return $n * 2
)
return `Count: { count($numbers) }. Sum: { sum($numbers) }`

dnovatchev commented 2 months ago

In the following, I am using ▶ as neutral placeholder (no proposal!) for the value map operator.

Instead of => and =!>, we could use ▶ and ! to build chains in which both sequences and items need to be bound…
(1 to 10) ! (. * 2)
▶ `Count: { count(.) }. Sum: { sum(.) }`

My main confusion is still the same - caused by using the dot '.' for denoting a sequence.

. (dot) stands for "context item". An item cannot be a substitute for a sequence in general.

Regardless of what characters are used for whatever operator, using . for denoting a sequence is deeply confusing and goes against all three previous versions of XPath. In all these three previous versions an expression like count(.) evaluates to 1 and is not too-meaningful.

This is why I am more inclined to have a "special variable" - something like: $$CS -standing for "context sequence" - if such a term makes any sense at all.

liamquin commented 2 months ago

i don't see any advantage to ▶ compared to, e.g.

let $seq := (1 to 10) ! (. * 2)
return `Count: { count($seq) }. Sum: { sum($seq) }`

and in fact prefer the second for being easier to read. I still do think there's a danger here of optimizing the language for people who use it all day every day, and not for people who use it for an hour or two a week, of a couple of days a month.

If the goal is a suffix operator,

(1 to 10) ! (. * 2) into $seq return ...

does not involve overriding . in ways that sometimes make it a sequence.

ChristianGruen commented 2 months ago

. (dot) stands for "context item". An item cannot be a substitute for a sequence in general.

Maybe a year ago, the dot was generalized to represent sequences, see https://qt4cg.org/specifications/xquery-40/xpath-40.html#dt-context-value

ChristianGruen commented 2 months ago

in fact prefer the second for being easier to read.

@liamquin I fully agree that let will be easier to read in most cases. Maybe it would even be safer to avoid ! in the example above, as not everyone knows what it does and how it differs from /.

! is an excellent example in general: Is it acceptable to use the operator, or should we avoid it because it may not be understood? My practical answer would be: It depends on the project and the target group of developers who will read the code.

If the goal is a suffix operator,

One use case I see are pipelines/chains. There is often code like…

let $tmp := EXPR
let $tmp := do-a-with-$tmp
let $tmp := do-b-with-$tmp
let $tmp := do-c-with-$tmp
return do-d-with-$tmp

…which could also be written as:

EXPR
▶ do-a-with-.
▶ do-b-with-.
▶ do-c-with-.
▶ do-d-with-.

That’s indeed why chains with => are already very common in practice…

EXPR
=> do-a-with()
=> do-b-with()
=> do-c-with()
=> do-d-with()

…but those are limited to binding the input to the first argument of a function.

With regard to the dot, it is certainly unusual in the beginning to use it for sequences. However, if it had been defined for sequences from the very beginning, I am pretty sure it would be very common today.

michaelhkay commented 2 months ago

I like the idea of thinking of this operator as a pipeline operator, and using an arrow therefore makes sense to me. In fact, before seeing the other comments this morning I was going to suggest ->.

It's unfortunate that ! doesn't work too nicely as a pipeline operator in conjunction with => because it has higher precedence, and I think that's the main reason we needed to introduce =!>. But if -> and ! had the same precedence then it might well be possible to dispense with =!>. Need to look at use cases.

ChristianGruen commented 2 months ago

I like the idea of thinking of this operator as a pipeline operator, and using an arrow therefore makes sense to me. In fact, before seeing the other comments this morning I was going to suggest ->.

Oh right; we dismissed -> a while ago as it was temporarily used as an alternative syntax for functions.

It's unfortunate that ! doesn't work too nicely as a pipeline operator in conjunction with => because it has higher precedence, and I think that's the main reason we needed to introduce =!>. But if -> and ! had the same precedence then it might well be possible to dispense with =!>. Need to look at use cases.

Here are two current examples for =!> from the spec:

(1 to 5)
=!> xs:double()
=!> math:sqrt()
=!> fn { . + 1 }()
=> sum()

"The cat sat on the mat"
=> tokenize() =!> concat(".") =!> upper-case()
=> string-join(" ")

This would be the !/-> syntax:

(1 to 5)
! xs:double(.)
! math:sqrt(.)
! (. + 1 )
-> sum(.)

"The cat sat on the mat"
-> tokenize(.) ! concat(., ".") ! upper-case(.)
-> string-join(.," ")

Regarding precedence, the following grammar rules seem most reasonable to me:

ValueExpr      ::=  ValidateExpr | ExtensionExpr | FocusExpr
FocusExpr      ::=  SimpleMapExpr ("->" SimpleMapExpr)*
SimpleMapExpr  ::=  PathExpr ("!" PathExpr)*

Arithmeticus commented 2 months ago

I'll look forward to @ChristianGruen 's presentation, however to this point I'm among those who would prefer, in this case, to avoid using punctuation, at least for the moment. In general, I think that new functionality should begin life as a keyword construct. Let it settle in. Then see if a punctuation-based shorthand is demanded.

dnovatchev commented 2 months ago

. (dot) stands for "context item". An item cannot be a substitute for a sequence in general.

Maybe a year ago, the dot was generalized to represent sequences, see https://qt4cg.org/specifications/xquery-40/xpath-40.html#dt-context-value

Reading and trying to understand it - and it makes little - almost no - sense!

From this (linked above) document:

[Definition: The context value is the value currently being processed.] In many cases (but not always), the context value will be a single item. [Definition: When the context value is a single item, it can also be referred to as the context item; when it is a single node, it can also be referred to as the context node.] The context value is returned by an expression consisting of a single dot (.). When an expression E1/E2 or E1[E2] is evaluated, each item in the sequence obtained by evaluating E1 becomes the context value in the inner focus for an evaluation of E2.

The definition: "The context value is the value currently being processed." is a circular definition.

In the text that follows, the so called "context value" seems always to refer to a single item.

There are no examples of a "context value" where this is a sequence...

Overall, it is not clear that such generalization is useful at all, as there are no examples and/or use-cases provided.

Not surprisingly, this contributes to the current and possibly future, confusion.

Because this generalization is not supported by examples and use-cases, and brings confusion, we need to consider removing it from the document.

There are many different parts/members of the context and we need to be able to address each such context-member individually - not to label all of them together as one generalized thing.

It is not immediately clear or logically deductible when using the dot to denote this generalized "context-value", which exactly member of the context the dot is referring to. And it is extremely confusing when having in different sub-expressions the dot to refer sometimes to a single item in one sub-expression and sometimes to a sequence - in another sub-expression.

Let us follow the very successful and usable example given to us by XSLT, where there are different standard XSLT functions that provide different context members: current(), current-group(), current-grouping-key(), ..., etc.

qt4cg / qtspecs