vmware-archive / quickstep

Quickstep Project
Apache License 2.0
27 stars 13 forks source link

Adds support for scalar subqueries #185

Closed jianqiao closed 8 years ago

jianqiao commented 8 years ago

This PR adds (limited) support for scalar subquery expressions.

Mainly two scenarios are handled: 1. The subquery expression is not correlated, e.g.

SELECT x
FROM a
WHERE a.y > (SELECT SUM(b.y) FROM b);

which would be transformed into the form

SELECT x
FROM a, (SELECT SUM(b.y) FROM b) AS t(sum_y)
WHERE b.y > sum_y;

and evaluated as a nested loops join between a and t.

TODO: Note that using nested loops join for this case incurs unnecessary overhead. Meanwhile the original implementation from quickstep-retire uses a "mutable ScalarQueryLiteral" approach which is efficient but breaks the protocol that relational operators are decoupled. That is, with the "mutable ScalarQueryLiteral" approach the scalar subquery is viewed like a ScalarLiteral but the literal value is updated at query runtime by a SetScalarQueryLiteral relational operator -- in other words, one relational operator will update the state of the other relational operator. So we need a further improvement that either optimize nested loops join for single-value relations or systematically incorporate the "mutable ScalarQueryLiteral" approach.

2. The subquery expression is correlated, e.g.

SELECT x
FROM a
WHERE a.y > (SELECT SUM(b.y) FROM b WHERE a.x = b.x);

which would be transformed into the form

SELECT x
FROM a JOIN (
  SELECT SUM(b.y), b.x
  FROM b
  GROUP BY b.x) AS t(sum_y, x) ON a.x = t.x
WHERE a.y > sum_y

and evaluated as hash inner join between a and t.

Examples for query plans are provided in the Select.test files.

pateljm commented 8 years ago

LGTM. Merging.