qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

FLOWR where clause with a "do when false" option #539

Closed graydon2014 closed 1 year ago

graydon2014 commented 1 year ago

This is a request for an enhancement.

Fairly often, I'll have a query arranged as

let $step1 := do some processing where exists($step1) let $step2 := processing based on step1 where exists($step2)

and so on.

This is a convenient pattern until I want to emit some sort of message about where the process stops.

It would be convenient to have

where expression else expression

with the else as an optional extension of the where clause to allow emitting information about which where clause the FLOWR expression stopped on.

It might be more congruent to the style of the language as

where expression return expression

but then again having multiple return keywords isn't obviously a good thing.

michaelhkay commented 1 year ago

It's not at all clear to me what you are proposing.

The effect of a where clause is to filter the tuple stream, that is, to conditionally include or exclude some of the tuples coming in from the previous clauses. Your use case is describing a special case where there is only one tuple, and the where clause reduces this to zero tuples. How would your proposed feature work in the general case?

graydon2014 commented 1 year ago

The specification presently has

WhereClause | ::= | "where" ExprSingle

A where clause serves as a filter for the tuples in its input tuple stream. The expression in the where clause, called the where-expression, is evaluated once for each of these tuples. If the [effective boolean value](https://qt4cg.org/specifications/xquery-40/xquery-40.html#dt-ebv) of the where-expression is true, the tuple is retained in the output tuple stream; otherwise the tuple is discarded.

I would like to change this to something like:

WhereClause | ::= | "where" ExprSingle (return ExprSingle)?

A where clause serves as a filter for the tuples in its input tuple stream. The expression in the where clause, called the where-expression, is evaluated once for each of these tuples. If the effective boolean value of the where-expression is true, the tuple is retained in the output tuple stream. If the where-expression is false and the optional return keyword is present, the return expression of the where clause is evaluated and its value returned before the tuple is discarded. Otherwise the tuple is discarded.

Much like the non-match results from fn:analyze-string() can be as useful as the match results, I think it would be helpful in general to have the option of doing something in response to the where evaluating to false, whether that's some kind of error reporting or alternative processing.

michaelhkay commented 1 year ago

the return expression of the where clause is evaluated and its value returned before the tuple is discarded

What do you mean by "its value [is] returned"? In the general case the where condition will be true for some tuples and false for others, and it's not clear what you expect the tuple stream (downstream from the where clause) to contain in that case.

Perhaps the condition you're really looking for is not that the where condition is false, but that the tuple stream is empty? Something like an on-empty clause

for $e in employee 
where $e/@salary gt 1000000
on empty return "no such employee"
return $e/@name

whose semantics are:

(Personally, I'm not sure I want to put much energy into further refinements of FLWOR expressions. I think we should be concentrating on operations applied to sequences of items, not streams of variable bindings. The reason for that is that the operations we can apply to sequences of items are infinitely extensible using functions, whereas the operations we can apply to streams of variable bindings are limited to a small number of keywords baked into the language syntax: doing anything new like grouping or windowing or iteration over maps and arrays requires syntax extensions rather than simply requiring new functions.)

graydon2014 commented 1 year ago

"its value is returned" was intended as "the same thing the return clause at the end of the FLOWR does".

The expression might be file:append-text-lines() adding to a log file, in which case nothing will show up. It might be a element where the successful return produces a element and I can sort them out afterwards.

I think "on empty" would be useful, but that's not the case I'm specifically concerned about here. I'm more thinking of the case where eight out of ten thousand fail and I'd like to be able to emit warnings or apply alternate processing from the main processing.

You absolutely have a point about sequence operations, but the notion of a tuple stream processor is expressive and powerful and I'd hate to have to duplicate it in functions with tuple objects. This is one of only two persistently annoying things about FLOWR expressions as I've used them. (The other is the window clause.)

michaelhkay commented 1 year ago

I'm sorry, it's still not clear to me what your proposed semantics are. Perhaps you can explain it by example. What would you expect the result of this to be:

for $i in 1 to 20
where $i mod 2 = 0 on-false-return "odd"
let $j := $i*$i 
return $j + 1

I've used the keyword on-false-return here, because using "return" is clearly ambiguous.

graydon2014 commented 1 year ago

I am indifferent to the precise keyword, and "on-false-return" seems unambiguous.

I would expect

odd
5
odd
17
odd
37
odd
65
odd
101
odd
145
odd
197
odd
257
odd
325
odd
401

as the result.

Having two distinct results, in the way you can have standard output and standard error from a shell command, at least sounds neat but I would perceive that to be a very large ask and that is not what I'm asking. (If that's the easy way to implement on-false-return I'm not against it by any means, but it seems like it would be hard.)

michaelhkay commented 1 year ago

So you somehow want the clause to inject some kind of value into the tuple stream that gets ignored and passed through unchanged by all subsequent steps in the processing chain and emerges unscathed at the end? That doesn't work for me, it's far too much complexity.

If you're looking for the result you describe, I would do:

for $i in 1 to 20
let $even := $i mod 2 = 0
return if ($even) then $i*$i  + 1 else "odd"
graydon2014 commented 1 year ago

Something very similar to the code you give is how I generated that list. The issue doesn't arise with single where clauses.

Perhaps a more useful way for me to express the ask would be:

Especially in a complex FLOWR expression with multiple where clauses, I want to be able to make those tuples, and only those tuples, which were not processed due to a where clause evaluating to false available for further processing.

Eventually, somehow, there needs to be a way to do error reporting. It may be my lack of understanding, but getting error information out of a FLOWR expression strikes me as difficult.

michaelhkay commented 1 year ago

Is this for diagnostics during development, or for use in production?

In issue #111 I proposed adding a trace clause to FLWOR expressions, would that help with your requirement?

Or would any of the other ideas suggested on that thread help?

graydon2014 commented 1 year ago

This would be for production. So available for further processing.

The trace clause passes the incoming tuple stream unchanged to the next clause in the pipeline, with the side effect of evaluating an expression in the context of the variables defined in that tuple stream and displaying the value of the expression in an implementation-defined way.

This strikes me as more useful for debugging than production, since it'd be difficult to process the resulting values. Either a trace clause or the function that outputs items without returning them would be welcome for diagnostics during development, but both strike me as a way to emit error messages more than a way to recognize the errors so they can be reported after further processing.

Perhaps

where ExprSingle use $ VarName

which has the effect of

let $ VarName as item()* := the FLOWR expression where the where clause occurs

only instead of the values from the return clause it gets populated with the tuples where the where evaluates to false? The variable would have to be in scope at the top level of the module.

michaelhkay commented 1 year ago

As far as I can see this is just a special case of the general problem with a functional language that it's hard to gather multiple results from a single expression. Our general solution to the problem is to return a map. If your aim is to split a set of items into those that satisfy a condition and those that don't, then perhaps grouping is the answer. The idea of having an expression that binds values to multiple variables, which you seem to be suggesting in your last comment, has been mooted before and it's very hard to find a clean way of doing it without completely upsetting the simple model that expressions are evaluated to return results, not to produce side-effects.

graydon2014 commented 1 year ago

Our general solution to the problem is to return a map.

Considering that every FLOWR must have one and only one return clause, perhaps

return as-map ExprSingle

where there's by convention an output or a return key with the result of the ExprSingle in the return clause and a where1, where2, etc. with the tuples where that where clause evaluated to false would entirely work for my purposes.

Whether that's clean from an implementation point of view remains a distinct question. But from an error reporting perspective I think this would be pretty close to optimal.

ndw commented 1 year ago

At meeting 041 the CG decided to close this issue without action.