Open JustusAdam opened 4 years ago
I think the underlying issue here is that the query as written requires a single aggregation that computes multiple columns, which isn't currently implemented. The workaround would be to compute the two values separately and then JOIN
them on x
.
The more technical reason is that this will result in a chain of two aggregation operators, both of which need y
as an input column. The query planner lays them out in sequence, so the first operator produces output columns (x, count(y))
. Note that this does not include y
. Noria detects this and "pulls" y
through the operator as a group column, so that it produces (x, y, count(y))
. However, this (correctly) triggers the assertion, since y
now shows up both as a group column and as the aggregated column.
The proper fix is for the query planner to compute these aggregations separately and concatenate the resulting columns (an operator we don't support yet), or to generate a single operator that does both aggregations at the same time (probably how you got here ;-) ).
EDIT: my earlier proposed nested view workaround doesn't work for the same reason (it would have to group by the aggregation column), so @jonhoo's join-based workaround is currently the only way to do this.
Thanks that's pretty much what I expected. I kinda want to express a similar pattern in my project so I'll have to figure out a way to teach the internals to do this. I'll let you know if I make any progress.
Anyway feel free to close this if you like.
The workaround proposed by jon worked, thanks.
I'm going to leave this open since it is something we should fix.
Noria seems not to allow for a select query with two simultaneous aggregations.
I set up this very simple table:
And a query with two aggregations:
Which throws a
cannot group by aggregation column
innoria-server/dataflow/src/ops/grouped/aggregate.rs:27:9
.By contrast if the select is only
count(x)
orsum(x)
it works just fine. I should also note that it also errors oncount(*), sum(y)
.My question would be whether this query is correct actually should work?
My guess is that it doesn't due to how the query graph is constructed. (One aggregate becomes the successor of the other and does not get the initial set of records/has its automatic group/state key include the computed column from the ancestor.)