Lookup/Indexing operator for sequences (supersedes #50)

michaelhkay commented 2 years ago

This proposal attempts to take over where issue #50 left off: that issue contains a lengthy discussion and many alternative suggestions, and seemed to end with a concrete proposal which I summarise here. I propose that issue #50 now be closed.

The proposal is for an expression which I will call a subscript-expression, taking the form

SubscriptExpression ::= ExprSingle "[#" Expr "]"

The first operand evaluates to an arbitrary sequence. The second operand evaluates to a sequence of integers (or is coerced to a sequence of integers using the coercion rules). Both operands are evaluated in the context of the containing expression (despite the similarity to filter expressions, the predicate does not have its own focus).

The result of the expression A [# B] (assuming B is indeed a sequence of integers) is for $i in B return A[$i].

Examples

$input[#1] - this is synonymous with $input[1]

$input[#1 to 5] - equivalent to $input[position() = (1 to 5)]

$input[#reverse(1 to 5)] - returns the first 5 items in reverse order

The main differences from the existing A[B] syntax are:

(a) there is no overloading, the semantics do not depend on the dynamic type of B.

(b) the value of the predicate can be a sequence of integers, not just a single integer

(c) the focus for the predicate is the same as the outer focus, so expressions such as *[#1 to count(*) idiv 2] make sense.

Alternative 1: use the syntax A #[ B ].

Alternative 2: use a function items-at(A, B)

ChristianGruen commented 2 years ago

+1 for fn:items-at, and avoiding too much syntactic sugar.

ChristianGruen commented 2 years ago

If we allow items to be returned in reverse order, would the following query be legal? If yes, what would it return?

(1 to 5)[#reverse(1 to .)]

dnovatchev commented 2 years ago

Alternative 1: use the syntax A #[ B ].

Alternative 2: use a function items-at(A, B)

To me Alternative 2 seems too complex, lengthy and less readable.

Thus, if I am forced to live with this, I will probably always write:

A =>items-at(B)

instead of items-at(A, B)

as the former is more readable and understandable.

Still, using the original proposal or Alternative 1 is much shorter and easier to understand:

So, I am equally satisfied from either:

A #[ B ]

or A [ #B ]

Maybe A #[ B ] seems a little bit better.

michaelhkay commented 2 years ago

@ChristianGruen Yes (1 to 5)[#reverse(1 to .)] is legal, but it's meaning depends on the value of the context item in the outer context; the focus within the subexpression doesn't change.

michaelhkay commented 2 years ago

@Dimitre, "as the former is more readable and understandable". The readability of a notation depends strongly on how much time you spend reading and using that notation; a figured bass on piano sheet music is highly readable to someone trained and accustomed to that notation, and very difficult for someone who doesn't see it very often. Evaluation of usability factors can only be made in the context of a specific section of the user community. Terse notations like #[ are more likely to be comfortable for people who spend a large proportion of their time reading and writing XPath, while more verbose notations like items-at are more likely to find favour with occasional users.

dimitre commented 2 years ago

@michaelhkay I spent some time trying to understand your point but I now I understand it is not for me :)

dnovatchev commented 2 years ago

@michaelhkay, Sorry that there exists a @dimitre, who happens not to be me. My github userid is: dnovatchev.

MHK: Sorry, I had forgotten these tags were treated as globally-unique names.

ChristianGruen commented 2 years ago

@ChristianGruen Yes (1 to 5)[#reverse(1 to .)] is legal, but it's meaning depends on the value of the context item in the outer context; the focus within the subexpression doesn't change.

I got it, thanks. I’ve finally read your proposal more carefully, which includes all relevant information.

I needed a while, though, to fully digest what's going on. Probably it's the syntax that reminded me too much of predicates, but works completely different. #(...) might look more intuitive to me.

But I'm wondering if we really need that? Is the use case common enough to justify new syntactic sugar?

ChristianGruen commented 2 years ago

more verbose notations like items-at are more likely to find favour with occasional users.

And functions have additional advantages: They can be chained, passed on as arguments, partially applied, etc.

dnovatchev commented 2 years ago

more verbose notations like items-at are more likely to find favour with occasional users.

And functions have additional advantages: They can be chained, passed on as arguments, partially applied, etc.

Yes, and we can absolutely do this with the function:

fn:op('#[]')

ChristianGruen commented 2 years ago

And we should do a survey if people believe fn:op('#[]')(A, B) or items-at(A, B) is more readable ;)

ndw commented 2 years ago

I think I'd prefer items-at(A,B) over this new bit of syntax. This observation

despite the similarity to filter expressions, the predicate does not have its own focus.

makes me uneasy. What's going to happen when folks put expressions in [# ...] expecting them to behave like expressions in [...] and they don't?

line-o commented 2 years ago

I am very much in favor of a function accessor as this can be called after arrow expressions and passed to functions accepting an accessor.

Use case 1: filter based on the n-th item of a sequence

array:filter(
  [(true(), true()), (true(), false()), (false(), false())],
  items-at(?, 2))

Use case 2: use in an arrow expression

(true(), true(), false(), true(), false()) => items-at((1,2,5))

Use case 3: access n-th item in predicate (from the index-where polyfill discussion on Slack)

declare function fn:index-where($seq, $predicate) {
  (1 to count($seq))[$predicate(items-at($seq, .)]
};

michaelhkay commented 2 years ago

PR #249 has been raised.

michaelhkay commented 1 year ago

The proposal for fn:items-at() was accepted on 22 November; this issue can now be closed.

qt4cg / qtspecs

Lookup/Indexing operator for sequences (supersedes #50) #213