mit-pdos / noria

Fast web applications through dynamic, partially-stateful dataflow
Apache License 2.0
4.98k stars 242 forks source link

Having mutliple aggregates in select errors #137

Open JustusAdam opened 4 years ago

JustusAdam commented 4 years ago

Noria seems not to allow for a select query with two simultaneous aggregations.

I set up this very simple table:

CREATE TABLE tab (x int, y int, PRIMARY KEY(x));

And a query with two aggregations:

VIEW test: 
    SELECT count(y), sum(y) 
    FROM tab 
    WHERE x = ? 
    GROUP BY x;

Which throws a cannot group by aggregation column in noria-server/dataflow/src/ops/grouped/aggregate.rs:27:9.

By contrast if the select is only count(x) or sum(x) it works just fine. I should also note that it also errors on count(*), sum(y).

My question would be whether this query is correct actually should work?

My guess is that it doesn't due to how the query graph is constructed. (One aggregate becomes the successor of the other and does not get the initial set of records/has its automatic group/state key include the computed column from the ancestor.)

jonhoo commented 4 years ago

I think the underlying issue here is that the query as written requires a single aggregation that computes multiple columns, which isn't currently implemented. The workaround would be to compute the two values separately and then JOIN them on x.

ms705 commented 4 years ago

The more technical reason is that this will result in a chain of two aggregation operators, both of which need y as an input column. The query planner lays them out in sequence, so the first operator produces output columns (x, count(y)). Note that this does not include y. Noria detects this and "pulls" y through the operator as a group column, so that it produces (x, y, count(y)). However, this (correctly) triggers the assertion, since y now shows up both as a group column and as the aggregated column.

The proper fix is for the query planner to compute these aggregations separately and concatenate the resulting columns (an operator we don't support yet), or to generate a single operator that does both aggregations at the same time (probably how you got here ;-) ).

EDIT: my earlier proposed nested view workaround doesn't work for the same reason (it would have to group by the aggregation column), so @jonhoo's join-based workaround is currently the only way to do this.

JustusAdam commented 4 years ago

Thanks that's pretty much what I expected. I kinda want to express a similar pattern in my project so I'll have to figure out a way to teach the internals to do this. I'll let you know if I make any progress.

Anyway feel free to close this if you like.

JustusAdam commented 4 years ago

The workaround proposed by jon worked, thanks.

jonhoo commented 4 years ago

I'm going to leave this open since it is something we should fix.