zhouqingqing / qpmodel

A Relational Optimizer and Executor
MIT License
64 stars 18 forks source link

OR should be handled in scalarToSingleJoin and inToMarkJoin #271

Open 9DemonFox opened 3 years ago

9DemonFox commented 3 years ago

we only handle it in existToMarkJoin

9DemonFox commented 3 years ago
PhysicFilter  (actual rows=0)
    Output: a.a1[0]
    Filter: a.a1[0]=bo.b1[1]
    -> PhysicSingleJoin Left (actual rows=0)
        Output: a.a1[0],bo.b1[2]
        Filter: b.b2[3]=a.a1[0]*2
        -> PhysicScanTable a (actual rows=0)
            Output: a.a1[0],#a.a2[1]
        -> PhysicFilter  (actual rows=0)
            Output: bo.b1[0],b.b2[1]
**            Filter: (bo.b2[2]=?a.a2[1] or (bo.b1[0]=b.b1[3] and bo.b2[2]<3))
            -> PhysicSingleJoin【2】 Left (actual rows=0)
                Output: bo.b1[0],b.b2[2],bo.b2[1],b.b1[3]
                -> PhysicScanTable b as bo (actual rows=0)
                    Output: bo.b1[0],bo.b2[1]
                -> PhysicScanTable b (actual rows=0)
                    Output: b.b2[1],b.b1[0]
                    Filter: b.b3[2]>1

the filter with have to stay with the SingleJoin 【2】, but the Expr** bo.b2[2]=?a.a2[1] can't be deparameter here.

zhouqingqing commented 3 years ago

what's the query?

9DemonFox commented 3 years ago
select a1 from a where a.a1 = (select b1 from b bo where b2 = a2 or b1 = (select b1 from b where b2 = 2*a1 and b3>1) and b2<3);

(bo.b2[2]=?a.a2[1] or (bo.b1[0]=b.b1[3] and bo.b2[2]<3)) will be deparameter after meet scan tabel a, i.e., as below.

PhysicFilter  (actual rows=0)
    Output: a.a1[0]
    Filter: a.a1[0]=bo.b1[1]
    -> PhysicSingleJoin Left (actual rows=0)
**            Filter: (bo.b2[2]=?a.a2[1] or (bo.b1[0]=b.b1[3] and bo.b2[2]<3))
        Output: a.a1[0],bo.b1[2]
        Filter: b.b2[3]=a.a1[0]*2
        -> PhysicScanTable a (actual rows=0)
            Output: a.a1[0],#a.a2[1]
        -> PhysicFilter  (actual rows=0)
            Output: bo.b1[0],b.b2[1]
            -> PhysicSingleJoin【2】 Left (actual rows=0)
                Output: bo.b1[0],b.b2[2],bo.b2[1],b.b1[3]
                -> PhysicScanTable b as bo (actual rows=0)
                    Output: bo.b1[0],bo.b2[1]
                -> PhysicScanTable b (actual rows=0)
                    Output: b.b2[1],b.b1[0]
                    Filter: b.b3[2]>1
9DemonFox commented 3 years ago
PhysicFilter  (actual rows=1)
    Output: a.a1[0]
    Filter: a.a1[0]=bo.b1[1]
    -> PhysicSingleJoin Left (actual rows=3)
        Output: a.a1[0],bo.b1[2]
        Filter: b.b2[3]=a.a1[0]*2
        -> PhysicScanTable a (actual rows=3)
            Output: a.a1[0],#a.a2[1]
        -> PhysicFilter  (actual rows=4, loops=3)
            Output: bo.b1[0],b.b2[1]
**        Filter: (bo.b2[2]=?a.a2[1] or (bo.b1[0]=b.b1[3] and bo.b2[2]<3))
            -> PhysicSingleJoin Left (actual rows=9, loops=3)
                Output: bo.b1[0],b.b2[2],bo.b2[1],b.b1[3]
                -> PhysicScanTable b as bo (actual rows=3, loops=3)
                    Output: bo.b1[0],bo.b2[1]
                -> PhysicScanTable b (actual rows=3, loops=9)
                    Output: b.b2[1],b.b1[0]
                    Filter: b.b3[2]>1

singleJoin should produce only one row, if we pull up the Filter, the singleJoin will produce more than one row, whitch is conflicting to singleJoin**.

9DemonFox commented 3 years ago

if we leave the filter inside the SingleJoin, as the OrExpr can not know value of bo.b2[2]=?a.a2[1], so it will also produce more lines like marker join, i.e. (bo.b2[2]=?a.a2[1] or #marker) . In a words, we have to know the value of bo.b2[2]=a.a2[1] before excute single join, i.e. it is a nested subquery.

9DemonFox commented 3 years ago

image hyperDB seems to has a complex plan to handle it.

select n_name from nation where nation.n_regionkey = (select region_o.r_regionkey from region region_o where region_o.r_name = nation.n_name or region_o.r_regionkey = (select region.r_regionkey from region where region.r_regionkey = 2*nation.n_regionkey and region.r_regionkey<3));