prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.05k stars 5.38k forks source link

Pushown filter through coalesce on join keys of outer/full join #23110

Open kaikalur opened 4 months ago

kaikalur commented 4 months ago
select count(distinct part) from 
 (select count(1) c, coalesce(p.part, s.part) as part from supplier full/left/right join part) on p.part=s.part group by 2)
 where part like '%a' and c > 10

We currently do not pushdown the filter (part like '%a%' down) to the scans. But carefully observing the semantics of coalesce, this can be pushdown below join. It currently works for inner join but it should work outer/full join as well.

CC: @feilong-liu

kaikalur commented 4 months ago

This is should even if you have additional join keys

ViggoC commented 4 months ago

Hi @kaikalur, I also very interested in this feature. Have you ever see any optimizer implement this rule or any theoretical proof. There is a similar issue in Trino https://github.com/trinodb/trino/issues/3722, but it is nor active for a long time.

ViggoC commented 4 months ago

There is a related discussion in calcite, the pushdown rule can be generalize to more expression other than coalesce. https://issues.apache.org/jira/browse/CALCITE-5193

kaikalur commented 3 months ago

Hi @kaikalur, I also very interested in this feature. Have you ever see any optimizer implement this rule or any theoretical proof. There is a similar issue in Trino trinodb/trino#3722, but it is nor active for a long time.

Not sure what you mean by theoretical proof. But this is legit. Sure there are possibly other expressions but this is a common real life usecase so we can implement this.

ViggoC commented 3 months ago

@kaikalur I total agree to push down a coalesce expression filter through outer join. I can't think of a counterexample. I just want to know have you see some material that disscusses what kind of expression can be push through the full join and how to do it.

kaikalur commented 3 months ago

@kaikalur I total agree to push down a coalesce expression filter through outer join. I can't think of a counterexample. I just want to know have you see some material that disscusses what kind of expression can be push through the full join and how to do it.

There are many rules that do pushdowns so once you find out which side to push what and go from there. it needs to be done after all other pushdowns. Also looks like we do this for inner join already so this maybe just relaxing the condition in an existing rule