Open sopel39 opened 4 years ago
In case of queries like:
InnerJoin[x1.a = x2.a]
/ \
TS[x1] InnerJoin[x2.a = z.a]
/ \
TS[x2] TS[z]
it can still be simplified to:
SelfJoin[x.a = x.a]
|
InnerJoin[x.a = z.a]
/ \
TS[x] TS[z]
That can be simplified when either:
z.a
keys are uniqueInnerJoin[x1.a = x2.a]
is cardinality insensitive (e.g there is group by on all columns on top of that join)
Case study
tpcds/q95
:Web sales tables are large and are currently read twice by Presto. However, since this is a self join Presto could wait until build side is read and feed it back as a probe side. This would save one read of
web_sales
table. Such optimization requires that join is executed as repartitioned join (which for large tables would be the case).It seems that we could try to optimize self joins both before and after
reorder joins
rule: