Closed graydon2014 closed 1 year ago
It's not at all clear to me what you are proposing.
The effect of a where clause is to filter the tuple stream, that is, to conditionally include or exclude some of the tuples coming in from the previous clauses. Your use case is describing a special case where there is only one tuple, and the where clause reduces this to zero tuples. How would your proposed feature work in the general case?
The specification presently has
WhereClause | ::= | "where" ExprSingle
A where clause serves as a filter for the tuples in its input tuple stream. The expression in the where clause, called the where-expression, is evaluated once for each of these tuples. If the [effective boolean value](https://qt4cg.org/specifications/xquery-40/xquery-40.html#dt-ebv) of the where-expression is true, the tuple is retained in the output tuple stream; otherwise the tuple is discarded.
I would like to change this to something like:
WhereClause | ::= | "where" ExprSingle (return ExprSingle)?
A where clause serves as a filter for the tuples in its input tuple stream. The expression in the where clause, called the where-expression, is evaluated once for each of these tuples. If the effective boolean value of the where-expression is true, the tuple is retained in the output tuple stream. If the where-expression is false and the optional return keyword is present, the return expression of the where clause is evaluated and its value returned before the tuple is discarded. Otherwise the tuple is discarded.
Much like the non-match results from fn:analyze-string() can be as useful as the match results, I think it would be helpful in general to have the option of doing something in response to the where evaluating to false, whether that's some kind of error reporting or alternative processing.
the return expression of the where clause is evaluated and its value returned before the tuple is discarded
What do you mean by "its value [is] returned"? In the general case the where condition will be true for some tuples and false for others, and it's not clear what you expect the tuple stream (downstream from the where
clause) to contain in that case.
Perhaps the condition you're really looking for is not that the where condition is false, but that the tuple stream is empty? Something like an on-empty clause
for $e in employee
where $e/@salary gt 1000000
on empty return "no such employee"
return $e/@name
whose semantics are:
(Personally, I'm not sure I want to put much energy into further refinements of FLWOR expressions. I think we should be concentrating on operations applied to sequences of items, not streams of variable bindings. The reason for that is that the operations we can apply to sequences of items are infinitely extensible using functions, whereas the operations we can apply to streams of variable bindings are limited to a small number of keywords baked into the language syntax: doing anything new like grouping or windowing or iteration over maps and arrays requires syntax extensions rather than simply requiring new functions.)
"its value is returned" was intended as "the same thing the return clause at the end of the FLOWR does".
The expression might be file:append-text-lines() adding to a log file, in which case nothing will show up. It might be a
I think "on empty" would be useful, but that's not the case I'm specifically concerned about here. I'm more thinking of the case where eight out of ten thousand fail and I'd like to be able to emit warnings or apply alternate processing from the main processing.
You absolutely have a point about sequence operations, but the notion of a tuple stream processor is expressive and powerful and I'd hate to have to duplicate it in functions with tuple objects. This is one of only two persistently annoying things about FLOWR expressions as I've used them. (The other is the window clause.)
I'm sorry, it's still not clear to me what your proposed semantics are. Perhaps you can explain it by example. What would you expect the result of this to be:
for $i in 1 to 20
where $i mod 2 = 0 on-false-return "odd"
let $j := $i*$i
return $j + 1
I've used the keyword on-false-return here, because using "return" is clearly ambiguous.
I am indifferent to the precise keyword, and "on-false-return" seems unambiguous.
I would expect
odd
5
odd
17
odd
37
odd
65
odd
101
odd
145
odd
197
odd
257
odd
325
odd
401
as the result.
Having two distinct results, in the way you can have standard output and standard error from a shell command, at least sounds neat but I would perceive that to be a very large ask and that is not what I'm asking. (If that's the easy way to implement on-false-return I'm not against it by any means, but it seems like it would be hard.)
So you somehow want the clause to inject some kind of value into the tuple stream that gets ignored and passed through unchanged by all subsequent steps in the processing chain and emerges unscathed at the end? That doesn't work for me, it's far too much complexity.
If you're looking for the result you describe, I would do:
for $i in 1 to 20
let $even := $i mod 2 = 0
return if ($even) then $i*$i + 1 else "odd"
Something very similar to the code you give is how I generated that list. The issue doesn't arise with single where clauses.
Perhaps a more useful way for me to express the ask would be:
Especially in a complex FLOWR expression with multiple where clauses, I want to be able to make those tuples, and only those tuples, which were not processed due to a where clause evaluating to false available for further processing.
Eventually, somehow, there needs to be a way to do error reporting. It may be my lack of understanding, but getting error information out of a FLOWR expression strikes me as difficult.
Is this for diagnostics during development, or for use in production?
In issue #111 I proposed adding a trace
clause to FLWOR expressions, would that help with your requirement?
Or would any of the other ideas suggested on that thread help?
This would be for production. So available for further processing.
The trace clause passes the incoming tuple stream unchanged to the next clause in the pipeline, with the side effect of evaluating an expression in the context of the variables defined in that tuple stream and displaying the value of the expression in an implementation-defined way.
This strikes me as more useful for debugging than production, since it'd be difficult to process the resulting values. Either a trace clause or the function that outputs items without returning them would be welcome for diagnostics during development, but both strike me as a way to emit error messages more than a way to recognize the errors so they can be reported after further processing.
Perhaps
where
ExprSingle use $
VarName
which has the effect of
let $ VarName as item()* := the FLOWR expression where the where clause occurs
only instead of the values from the return clause it gets populated with the tuples where the where evaluates to false? The variable would have to be in scope at the top level of the module.
As far as I can see this is just a special case of the general problem with a functional language that it's hard to gather multiple results from a single expression. Our general solution to the problem is to return a map. If your aim is to split a set of items into those that satisfy a condition and those that don't, then perhaps grouping is the answer. The idea of having an expression that binds values to multiple variables, which you seem to be suggesting in your last comment, has been mooted before and it's very hard to find a clean way of doing it without completely upsetting the simple model that expressions are evaluated to return results, not to produce side-effects.
Our general solution to the problem is to return a map.
Considering that every FLOWR must have one and only one return clause, perhaps
return as-map
ExprSingle
where there's by convention an output or a return key with the result of the ExprSingle in the return clause and a where1, where2, etc. with the tuples where that where clause evaluated to false would entirely work for my purposes.
Whether that's clean from an implementation point of view remains a distinct question. But from an error reporting perspective I think this would be pretty close to optimal.
At meeting 041 the CG decided to close this issue without action.
This is a request for an enhancement.
Fairly often, I'll have a query arranged as
let $step1 := do some processing where exists($step1) let $step2 := processing based on step1 where exists($step2)
and so on.
This is a convenient pattern until I want to emit some sort of message about where the process stops.
It would be convenient to have
where expression else expression
with the else as an optional extension of the where clause to allow emitting information about which where clause the FLOWR expression stopped on.
It might be more congruent to the style of the language as
where expression return expression
but then again having multiple return keywords isn't obviously a good thing.