Open sopel39 opened 4 years ago
Applying limit is not even beneficial from performance point of view. We most likely can finish quicker without filtering rows than with extra filtering.
That's correct, but it may happen that applyFilter is simply invoked before LIMIT makes it into proximity of the TS. In such a case, the filter has already been applied and there is (currently) no way back. Then the only choice is whether you also apply the limit or not.
and especially #3691 (comment)
for ref, here is the comment's content:
I realized we had this discussion some time ago with JDBC connectors, but cannot find it now. @martint @electrum ? Unfortunately we didn't capture much of the thinking in the code, and it looks as if this was not addressed.
That's correct, but it may happen that applyFilter is simply invoked before LIMIT makes it into proximity of the TS.
That would be pushdown of operators in same order as they are ordered in Presto. This doesn't change query semantics.
Then the only choice is whether you also apply the limit or not.
If not applying filter would improve performance, I think we should not apply it.
What if we could maintain the order and generate the Query based on the order For a query plan like
Filter ->Limit-> TableScan
SELECT col.. FROM (SELECT * FROM tbl LIMIT n) WHERE <filter clause>
Limit ->Filter -> TableScan
SELECT col.. FROM (SELECT * FROM tbl WHERE <filter clause>) LIMIT n
Another though to this issue. Pushdown of FILTER after LIMIT might make sense if there is an aggregation e.g:
Filter -> Limit -> Aggregation -> TableScan
if filter is on aggregation keys, then less groups might need to be evaluated.
Pushing predicate through limit might change query semantics (e.g when data is ordered). We don't push predicate through limit in Presto.
CC: @findepi