This PR adds (limited) support for scalar subquery expressions.
Mainly two scenarios are handled:
1. The subquery expression is not correlated, e.g.
SELECT x
FROM a
WHERE a.y > (SELECT SUM(b.y) FROM b);
which would be transformed into the form
SELECT x
FROM a, (SELECT SUM(b.y) FROM b) AS t(sum_y)
WHERE b.y > sum_y;
and evaluated as a nested loops join between a and t.
TODO:
Note that using nested loops join for this case incurs unnecessary overhead. Meanwhile the original implementation from quickstep-retire uses a "mutable ScalarQueryLiteral" approach which is efficient but breaks the protocol that relational operators are decoupled. That is, with the "mutable ScalarQueryLiteral" approach the scalar subquery is viewed like a ScalarLiteral but the literal value is updated at query runtime by a SetScalarQueryLiteral relational operator -- in other words, one relational operator will update the state of the other relational operator.
So we need a further improvement that either optimize nested loops join for single-value relations or systematically incorporate the "mutable ScalarQueryLiteral" approach.
2. The subquery expression is correlated, e.g.
SELECT x
FROM a
WHERE a.y > (SELECT SUM(b.y) FROM b WHERE a.x = b.x);
which would be transformed into the form
SELECT x
FROM a JOIN (
SELECT SUM(b.y), b.x
FROM b
GROUP BY b.x) AS t(sum_y, x) ON a.x = t.x
WHERE a.y > sum_y
and evaluated as hash inner join between a and t.
Examples for query plans are provided in the Select.test files.
This PR adds (limited) support for scalar subquery expressions.
Mainly two scenarios are handled: 1. The subquery expression is not correlated, e.g.
which would be transformed into the form
and evaluated as a nested loops join between
a
andt
.TODO: Note that using nested loops join for this case incurs unnecessary overhead. Meanwhile the original implementation from
quickstep-retire
uses a "mutable ScalarQueryLiteral" approach which is efficient but breaks the protocol that relational operators are decoupled. That is, with the "mutable ScalarQueryLiteral" approach the scalar subquery is viewed like aScalarLiteral
but the literal value is updated at query runtime by aSetScalarQueryLiteral
relational operator -- in other words, one relational operator will update the state of the other relational operator. So we need a further improvement that either optimize nested loops join for single-value relations or systematically incorporate the "mutable ScalarQueryLiteral" approach.2. The subquery expression is correlated, e.g.
which would be transformed into the form
and evaluated as hash inner join between
a
andt
.Examples for query plans are provided in the
Select.test
files.