uwescience / raco

Compilation and rule-based optimization framework for relational algebra. Raco is the language, optimization, and query translation layer for the Myria project.
Other
72 stars 19 forks source link

order by syntax only works with positional attributes, not with named attributed #573

Closed orzikhd closed 6 years ago

orzikhd commented 6 years ago

Where $0 would refer to column a of table TwitterK: These work: T = [from scan(TwitterK) as k emit $0 order by $0 limit 10]; T = select $0 from scan(TwitterK) as k order by $0 asc limit 10;

These don't work: T = [from scan(TwitterK) as k emit a order by a limit 10]; T = select a from scan(TwitterK) as k order by a asc limit 10;

Throws:

Error 400 (Bad Request): Error 400 (Bad Request): 
Can not construct instance of [I from String value 'InMemoryOrderBy': not a valid int value at [Source: myriadeps.org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$UnCloseableInputStream@4783f0c8; line: 1, column: 670] (through reference chain: edu.washington.escience.myria.api.encoding.QueryEncoding["plan"]->java.util.ArrayList[1]->edu.washington.escience.myria.api.encoding.PlanFragmentEncoding["operators"]->java.util.ArrayList[1]->edu.washington.escience.myria.api.encoding.InMemoryOrderByEncoding["argSortColumns"]->int[][0])

Probably related, while the reference k.a here works fine: T = select k.a from scan(TwitterK) as k; an error is thrown when running T = select k.a from scan(TwitterK) as k order by k.a asc limit 10;

MyrialParseException: Parse error at token . on line 2

senderista commented 6 years ago

I made an initial change to allow qualified column refs in the ORDER BY clause which referenced relations in the FROM clause, but this clearly can't work with aggregates, and also requires us to either add the ORDER BY columns from the FROM schema to the SELECT schema so we can apply the OrderBy after the Apply (which means hidden column magic), or apply the OrderBy before the Apply (which means we can't use SELECT args in the ORDER BY clause, and again is incompatible with aggregates). In the end I concluded that the only consistent solution was to only allow unqualified (positional or named) column references to the emit schema (i.e., the SELECT list) in the ORDER BY clause. This allows you to order by arbitrary expressions, since you just define them in the SELECT list. It also works fine with aggregates:

T = [from scan(TwitterK) as k emit k.a, count(*) order by a limit 10];
senderista commented 6 years ago

fixed in fe87d73d