lwhay / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Type propagation fails to identify not null variables coming from outer branch of left outer join #794

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
If a variable is coming from the outer branch of a left outer join, and then we 
have a select operator that filters out the null values of that variable, the 
types of all the variables coming from the outer branch are converted from 
AUnionType(union of null and the actual type of the variable) to the non-null 
type in the union.

However, if we reassign these variables to new variables after left outer join 
but before select that gets rid of null values, the type propagation does not 
work properly, and we still get AUnionType even though the nulls are filtered 
out.

Here are the plan snippets:

In this plan snippet Listify asks for the type of the variable $155, and get 
INT32:

group by ([$$129 := %0->$$240]) decor ([$$170 := %0->$$238; $$168 := 
%0->$$239]) {
        aggregate [$$154] <- [function-call: asterix:listify, Args:[%0->$$155]]
        -- AGGREGATE  |LOCAL|
          select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$159]])
          -- STREAM_SELECT  |LOCAL|
            nested tuple source
            -- NESTED_TUPLE_SOURCE  |LOCAL|
     }
-- PRE_CLUSTERED_GROUP_BY[$$240]  |PARTITIONED|
exchange 
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  order (ASC, %0->$$240) (ASC, %0->$$155) 
  -- STABLE_SORT [$$240(ASC), $$155(ASC)]  |PARTITIONED|
    exchange 
    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
      replicate 
      -- SPLIT  |PARTITIONED|
        exchange 
        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
          project ([$$239, $$238, $$155, $$159, $$240])
          -- STREAM_PROJECT  |PARTITIONED|
            exchange 
            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
              left outer join (function-call: algebricks:eq, Args:[%0->$$156, %0->$$159])
              -- HYBRID_HASH_JOIN [$$156][$$159]  |PARTITIONED|
                // removed inner branch to reduce the plan size
                        ...
                        // this is the outer branch
                exchange 
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  project ([$$159, $$155])
                  -- STREAM_PROJECT  |PARTITIONED|
                    assign [$$159, $$155] <- [%0->$$143, %0->$$139] => the source of the type for $$155
                    -- ASSIGN  |PARTITIONED|
                      ...

In the following plan snippet the type of variable $$42 is propagated as 
UNION(NULL, INT32).

group by ([$$59 := %0->$$218]) decor ([$$58 := %0->$$219]) {
          aggregate [$$82] <- [function-call: asterix:listify, Args:[%0->$$42]]
          -- AGGREGATE  |LOCAL|
            order (ASC, %0->$$42) 
            -- IN_MEMORY_STABLE_SORT [$$42(ASC)]  |LOCAL|
              select (function-call: algebricks:not, Args:[function-call: algebricks:is-null, Args:[%0->$$45]])
              -- STREAM_SELECT  |LOCAL|
                nested tuple source
                -- NESTED_TUPLE_SOURCE  |LOCAL|
       }
-- PRE_CLUSTERED_GROUP_BY[$$218]  |PARTITIONED|
  exchange 
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    order (ASC, %0->$$218) 
    -- STABLE_SORT [$$218(ASC)]  |PARTITIONED|
      exchange 
      -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
        project ([$$218, $$219, $$42, $$45])
        -- STREAM_PROJECT  |PARTITIONED|
          assign [$$219, $$220, $$42, $$45, $$218] <- [%0->$$239, %0->$$238, %0->$$155, %0->$$159, %0->$$240] => the source of the type for $$42
          -- ASSIGN  |PARTITIONED|
            exchange 
            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
              replicate 
              -- SPLIT  |PARTITIONED|
                exchange 
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  project ([$$239, $$238, $$155, $$159, $$240])
                  -- STREAM_PROJECT  |PARTITIONED|
                    exchange 
                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                      left outer join (function-call: algebricks:eq, Args:[%0->$$156, %0->$$159])
                      -- HYBRID_HASH_JOIN [$$156][$$159]  |PARTITIONED|
                        // removed inner branch to reduce the plan size
                        ...
                        // this is the outer branch
                        exchange 
                        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                          project ([$$159, $$155])
                          -- STREAM_PROJECT  |PARTITIONED|
                            assign [$$159, $$155] <- [%0->$$143, %0->$$139]
                            -- ASSIGN  |PARTITIONED|
                              ...       

The variables $$42 and $$155 are coming from the same source and replicated in 
the Replicate operator, and in both plans null values are filtered out. We 
should also be able to identify the type of variable $$42 as only INT32 in the 
second plan snippet. 

Based on my understanding of current type propagation, we traverse the plan top 
down till we find the source of the variable. In the first case, we see that 
the variable is coming from the outer branch of a left outer join, and since 
one of the variables coming from that branch is not nullable, the variable we 
are looking for also becomes not nullable. But in the second case we find the 
type of the variable in the assign(bold) before reaching the left outer join. 
Even though we know $$45 is not null, we don't know $$42 and $$45 are coming 
from the outer branch of a left outer join at this point.

Original issue reported on code.google.com by icetin...@gmail.com on 7 Aug 2014 at 10:54

GoogleCodeExporter commented 9 years ago
This issue was closed by revision f14a378184bb.

Original comment by icetin...@gmail.com on 7 Aug 2014 at 11:14

GoogleCodeExporter commented 9 years ago
The status is changed automatically to "Fixed" when I pushed a change into my 
branch with a commit message "fixed issue 794". Changing it to "UnderReview".

Original comment by icetin...@gmail.com on 7 Aug 2014 at 11:31