zhouqingqing / qpmodel

A Relational Optimizer and Executor
MIT License
64 stars 18 forks source link

#issue72 select subquery fixed & more than one markJoin support #236

Closed 9DemonFox closed 3 years ago

9DemonFox commented 3 years ago

I am reminded of of by this very much.

Column Ordinal resolution is done in a top down fasion: the top output list is pushed to its child node and this node can add its own requirements (say a filter) to this list and further pushed down to its children. That's why ResolveColumnOrdinal is implmented as a virtual function for each logic node.

Durting I handle the subquery in selection, I test it with more than one exist subquery. So, by the way, I tried to fix the markjoin issue. because the markjoin will always be projected out at last, so I change it to the last insted of the first.

 // always the last column, as it is added at the last time 
    void fixMarkerValue(Row r, Value value) => r[r.ColCount()-1] = value;
9DemonFox commented 3 years ago

During I test the code, another issue is found. the SQL

select *
from nation N
where N.n_name like 'A%'

shoud return

0,ALGERIA,0,haggle. carefully final deposits detect slyly agai
1,ARGENTINA,1,al foxes promise slyly according to the regular accounts. bold requests alon

but it get result

0,ALGERIA,0,haggle. carefully final deposits detect slyly agai
1,ARGENTINA,1,al foxes promise slyly according to the regular accounts. bold requests alon
2,BRAZIL,1,y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special
3,CANADA,1,eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold
5,ETHIOPIA,0,ven packages wake quickly. regu
14,KENYA,0,pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t
16,MOZAMBIQUE,0,s. ironic, unusual asymptotes wake blithely r
24,UNITED STATES,1,y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be

instead.

9DemonFox commented 3 years ago

Unit tests were passed in local. I will check later

9DemonFox commented 3 years ago

Because the online unittest get the right hand output but the local get the left hand output, I push so many times. But I am not very sure what make this different. image

zhouqingqing commented 3 years ago

where N.n_name like 'A%'

We didn't really completely handle LIKE per its semantics. We handle it more like a CONTAINS. You can try to fix it with a separate PR.

zhouqingqing commented 3 years ago

online unittest get the right hand output but the local get the left hand output

The left side looks more reasonable to me and the right side seems totally ignoring the predicate. Is this reliably repro?

9DemonFox commented 3 years ago

Is this reliably repro?

I don't understand what repro refers to.
I run both of the online unittest and the local unittest several times, they always get the different result. And the online is aways the right-hand output, while the local get the left-hand output.

9DemonFox commented 3 years ago

when the TestClass Ubenchmark run before the TestClass Subquery, the issue will show up.

image As the visual studio run UnitTest by the dictionary order. The issue can can be reproduced by change the Subquery to VSubquery

image

9DemonFox commented 3 years ago

What cause this is the static class Catalog. And the hist in the stat is not cleared.

zhouqingqing commented 3 years ago

What cause this is the static class Catalog.

We use static for two purposes: (1) if the object is singleton (like catalog); (2) if we need access the object deep in call chain and we don't want to pass the object along the way (like some optimizer options).

9DemonFox commented 3 years ago

We use static for two purposes: (1) if the object is singleton (like catalog); (2) if we need access the object deep in call chain and we don't want to pass the object along the way (like some optimizer options).

I clear the log relate to this Unittest, so it work well in any order now.

9DemonFox commented 3 years ago

And now it support nest exsitsSubquery like sql below.

select a1 from a where exists (select b.b1 from b where b.b2=a.a1 and exists (select c.c2 from c where c.c1=b.b1))

Whose physic plan likes blow before.

PhysicFilter  (actual rows=0)
    Output: a.a1[0]
    Filter: ({#marker}[0] and {#marker}[0])
    -> PhysicMarkJoin Left (actual rows=0)
        Output: a.a1[0],#marker,#marker
        -> PhysicScanTable a (actual rows=0)
            Output: a.a1[0]
        -> PhysicMarkJoin Left (actual rows=0)
            Output: #marker
            Filter: c.c1[1]=b.b1[0]
            -> PhysicScanTable b (actual rows=0)
                Output: b.b1[0]
                Filter: b.b2[1]=?a.a1[0]
            -> PhysicScanTable c (actual rows=0)
                Output: c.c1[0]

Where there is still a parameter Expr Filter: b.b2[1]=?a.a1[0]。 Now it works well.

PhysicFilter  (actual rows=2)
    Output: a.a1[0]
    Filter: {#marker}[1]
    -> PhysicMarkJoin Left (actual rows=3)
        Output: a.a1[0],#marker
        Filter: b.b2[1]=a.a1[0]
        -> PhysicScanTable a (actual rows=3)
            Output: a.a1[0]
        -> PhysicFilter  (actual rows=3, loops=3)
            Output: b.b2[0]
            Filter: {#marker}[1]
            -> PhysicMarkJoin Left (actual rows=3, loops=3)
                Output: b.b2[0],#marker
                Filter: c.c1[2]=b.b1[1]
                -> PhysicScanTable b (actual rows=3, loops=3)
                    Output: b.b2[1],b.b1[0]
                -> PhysicScanTable c (actual rows=3, loops=9)
                    Output: c.c1[0]
9DemonFox commented 3 years ago

seperate this PR to several PRs