Open sundy-li opened 2 weeks ago
The filter a > 1 AND a IS NOT NULL
is not being pushed down into both tables t1
and t
in the Databend query execution plan because the current implementation applies the filters after the join operation, not during the table scan phase. This behavior is evident from the provided execution plans, which show that the filters are applied post-join.
For example, in the query:
explain select * from t left join t t1 on t.a = t1.a where t1.a <= 1 or (t.a > 1 and t1.a > 1)
The execution plan shows:
Filter
├── output columns: [t.a (#0), t1.a (#1)]
├── filters: [is_true(t1.a (#1) <= 1 OR t.a (#0) > 1 AND t1.a (#1) > 1)]
├── estimated rows: 6.13
└── HashJoin
├── output columns: [t.a (#0), t1.a (#1)]
├── join type: INNER
├── build keys: [t1.a (#1)]
├── probe keys: [t.a (#0)]
├── filters: []
├── estimated rows: 7.47
├── Filter(Build)
│ ├── output columns: [t1.a (#1)]
│ ├── filters: [is_true(t1.a (#1) <= 1 OR t1.a (#1) > 1)]
│ ├── estimated rows: 8.20
│ └── TableScan
│ ├── table: default.eliminate_outer_join.t
│ ├── output columns: [a (#1)]
│ ├── read rows: 10
│ ├── read size: < 1 KiB
│ ├── partitions total: 1
│ ├── partitions scanned: 1
│ ├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1 to 1>]
│ ├── push downs: [filters: [is_true(t.a (#1) <= 1 OR t.a (#1) > 1)], limit: NONE]
│ └── estimated rows: 10.00
└── Filter(Probe)
├── output columns: [t.a (#0)]
├── filters: [is_true(t.a (#0) <= 1 OR t.a (#0) > 1)]
├── estimated rows: 8.20
└── TableScan
├── table: default.eliminate_outer_join.t
├── output columns: [a (#0)]
├── read rows: 10
├── read size: < 1 KiB
├── partitions total: 1
├── partitions scanned: 1
├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1 to 1>]
├── push downs: [filters: [is_true(t.a (#0) <= 1 OR t.a (#0) > 1)], limit: NONE]
└── estimated rows: 10.00
The filters are applied after the join operation, and the push downs during the table scan phase do not include the filter a > 1 AND a IS NOT NULL
[1].
The logic for filter pushdown is implemented in several files, including rule_push_down_filter_join.rs
, rule_push_down_filter_project_set.rs
, dphyp.rs
, and physical_table_scan.rs
[2][3][4][5].
There are also existing unit tests that cover filter pushdown functionality, such as those in push_down_filter_eval_scalar.test
and infer_filter.test
[6][7][8].
However, there are known limitations and TODOs related to filter pushdown, such as the need to fix limit pushdown during tests [9][4].
To continue talking to Dosu, mention @dosu.
Summary
https://github.com/sundy-li/databend/blob/master/tests/sqllogictests/suites/mode/standalone/explain/filter.test#L48-L67
filter
a>1 AND a IS NOT
should be pushed down into both tablest1
andt
.duckdb is correct in this query, and it's result examples: