Open kaikalur opened 2 months ago
This already works for INNER JOINs -
presto:tiny> explain select COALESCE(l.nationkey, r.regionkey) FROM nation l inner join region r on l.nationkey=r.regionkey;
Query Plan >
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
- Output[PlanNodeId 9][_col0] => [nationkey:bigint] >
Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 630.00, memory: 45.00, network: 90.00} >
_col0 := nationkey (1:16) >
- RemoteStreamingExchange[PlanNodeId 308][GATHER] => [nationkey:bigint] >
Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 630.00, memory: 45.00, network: 90.00} >
- InnerJoin[PlanNodeId 286][("nationkey" = "regionkey_0")] => [nationkey:bigint] >
Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 630.00, memory: 45.00, network: 45.00} >
Distribution: REPLICATED >
- TableScan[PlanNodeId 0][TableHandle {connectorId='tpch', connectorHandle='nation:sf0.01', layout='Optional[nation:sf0.01]'}] => [nationkey:bigint] >
Estimates: {source: CostBasedSourceInfo, rows: 25 (225B), cpu: 225.00, memory: 0.00, network: 0.00} >
nationkey := tpch:nationkey (1:56) >
- LocalExchange[PlanNodeId 334][HASH] (regionkey_0) => [regionkey_0:bigint] >
Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 90.00, memory: 0.00, network: 45.00} >
- RemoteStreamingExchange[PlanNodeId 307][REPLICATE] => [regionkey_0:bigint] >
Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 45.00, memory: 0.00, network: 45.00} >
- TableScan[PlanNodeId 1][TableHandle {connectorId='tpch', connectorHandle='region:sf0.01', layout='Optional[region:sf0.01]'}] => [regionkey_0:bigint] >
Estimates: {source: CostBasedSourceInfo, rows: 5 (45B), cpu: 45.00, memory: 0.00, network: 0.00} >
regionkey_0 := tpch:regionkey (1:76)
For outer joins, we would need to set the joins_not_null_inference_strategy
to a value other than NONE
, so that we add NOT NULL filters on the outer side of the join.
Once this is set, we could optimize out the COALESCE
We have seen queries like:
Here for both inner ad left joins we can simplify it to
l.id
since it's a join key. Similarly for the converse for right join. We should apply this rule before join reordering