namhnguyen / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Problems on running aggregation in nested query #782

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
I run into an issue with running a query with nested aggregation on TPCH data:

 I started with a sum aggregation:
use dataverse TPCH;

for $nation in dataset nation
for $sn in dataset selectedNations
where $nation.n_nationkey = $sn.sn_nationkey  /*+ indexnl */
return {
  "nation_key": $nation.n_nationkey,
  "name": $nation.n_name,
  "aggregates": for $order in dataset orders
                for $customer in dataset customers
                where $order.o_custkey = $customer.c_custkey
                and  $customer.c_nationkey = $nation.n_nationkey
                group by $orderdate := $order.o_orderdate with $order
                let $totalprice := $order.o_totalprice
                let $sum := sum($totalprice)
                order by $sum
                limit 3
                return {
                  "order_date": $orderdate,
                  "sum_price": $sum
                }
}

It can compile into logical plan:
distribute result [%0->$$13] -- |UNPARTITIONED|
  project ([$$13]) -- |UNPARTITIONED|
    assign [$$13] <- [function-call: asterix:open-record-constructor, Args:[AString: {nation_key}, function-call: asterix:field-access-by-name, Args:[%0->$$0, AString: {n_nationkey}], AString: {name}, function-call: asterix:field-access-by-name, Args:[%0->$$0, AString: {n_name}], AString: {aggregates}, %0->$$30]] -- |UNPARTITIONED|
      subplan {
                aggregate [$$30] <- [function-call: asterix:listify, Args:[%0->$$29]] -- |UNPARTITIONED|
                  assign [$$29] <- [function-call: asterix:open-record-constructor, Args:[AString: {order_date}, %0->$$4, AString: {sum_price}, %0->$$7]] -- |UNPARTITIONED|
                    limit AInt32: {3} -- |UNPARTITIONED|
                      order (ASC, %0->$$7)  -- |UNPARTITIONED|
                        assign [$$7] <- [function-call: asterix:sum, Args:[%0->$$6]] -- |UNPARTITIONED|
                          assign [$$6] <- [function-call: asterix:field-access-by-name, Args:[%0->$$26, AString: {o_totalprice}]] -- |UNPARTITIONED|
                            group by ([$$4 := function-call: asterix:field-access-by-name, Args:[%0->$$5, AString: {o_orderdate}]]) decor ([]) {
                                      aggregate [$$26] <- [function-call: asterix:listify, Args:[%0->$$5]] -- |UNPARTITIONED|
                                        nested tuple source -- |UNPARTITIONED|
                                   } -- |UNPARTITIONED|
                              select (function-call: algebricks:and, Args:[function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-name, Args:[%0->$$5, AString: {o_custkey}], function-call: asterix:field-access-by-name, Args:[%0->$$3, AString: {c_custkey}]], function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-name, Args:[%0->$$3, AString: {c_nationkey}], function-call: asterix:field-access-by-name, Args:[%0->$$0, AString: {n_nationkey}]]]) -- |UNPARTITIONED|
                                unnest $$3 <- function-call: asterix:dataset, Args:[AString: {customers}] -- |UNPARTITIONED|
                                  unnest $$5 <- function-call: asterix:dataset, Args:[AString: {orders}] -- |UNPARTITIONED|
                                    nested tuple source -- |UNPARTITIONED|
             } -- |UNPARTITIONED|
        select (function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-name, Args:[%0->$$0, AString: {n_nationkey}], function-call: asterix:field-access-by-name, Args:[%0->$$1, AString: {sn_nationkey}]]) -- |UNPARTITIONED|
          unnest $$1 <- function-call: asterix:dataset, Args:[AString: {selectedNations}] -- |UNPARTITIONED|
            unnest $$0 <- function-call: asterix:dataset, Args:[AString: {nation}] -- |UNPARTITIONED|
              empty-tuple-source -- |UNPARTITIONED|

but fails at execution: 
Unsupported type [ closed {
  o_orderkey: INT32,
  o_custkey: INT32,
  o_orderstatus: STRING,
  o_totalprice: DOUBLE,
  o_orderdate: STRING,
  o_orderpriority: STRING,
  o_clerk: STRING,
  o_shippriority: INT32,
  o_comment: STRING
}
 ] for field access expression: function-call: asterix:field-access-by-name, Args:[%0->$$26, AString: {o_totalprice}] [AlgebricksException]

Then I changed the sum into count, which is similar to the one used in doc 
example:
  use dataverse TPCH;

for $nation in dataset nation
for $sn in dataset selectedNations
where $nation.n_nationkey = $sn.sn_nationkey  /*+ indexnl */
return {
  "nation_key": $nation.n_nationkey,
  "name": $nation.n_name,
  "aggregates": for $order in dataset orders
                for $customer in dataset customers
                where $order.o_custkey = $customer.c_custkey
                and  $customer.c_nationkey := $nation.n_nationkey
                group by $orderdate := $order.o_orderdate with $order
                let $count := count($order)
                order by $count
                limit 3
                return {
                  "order_date": $orderdate,
                  "count": $count
                }
}

again, this time if compiles, but it fails with execution exception:
Could not infer type for variable '$$35'. [AlgebricksException]

What is the expected output? What do you see instead?
The query runs, since it can compile into plans

What version of the product are you using? On what operating system?
0.8.3, OS X

Please provide any additional information below.

Original issue reported on code.google.com by richb...@gmail.com on 1 Jun 2014 at 6:58

GoogleCodeExporter commented 9 years ago
It seems that the datatypes, which you are using, are declared as closed. At 
the same time you are trying to obtain value from the fields, which are not 
part of the datatype schema definition.

Original comment by ildar.absalyamov on 1 Jun 2014 at 5:15

GoogleCodeExporter commented 9 years ago
I changed the datatypes to open for all the base datasets, but again I got the 
error: Could not infer type for variable '$$38'. [AlgebricksException].

In addition, I couldn't find the variable $$38 in the printed logical plan:
distribute result [%0->$$13] -- |UNPARTITIONED|
  project ([$$13]) -- |UNPARTITIONED|
    assign [$$13] <- [function-call: asterix:open-record-constructor, Args:[AString: {nation_key}, function-call: asterix:field-access-by-name, Args:[%0->$$0, AString: {n_nationkey}], AString: {name}, function-call: asterix:field-access-by-name, Args:[%0->$$0, AString: {n_name}], AString: {aggregates}, %0->$$31]] -- |UNPARTITIONED|
      subplan {
                aggregate [$$31] <- [function-call: asterix:listify, Args:[%0->$$30]] -- |UNPARTITIONED|
                  assign [$$30] <- [function-call: asterix:open-record-constructor, Args:[AString: {order_date}, %0->$$4, AString: {sum_price}, %0->$$7]] -- |UNPARTITIONED|
                    limit AInt32: {3} -- |UNPARTITIONED|
                      order (ASC, %0->$$7)  -- |UNPARTITIONED|
                        assign [$$7] <- [function-call: asterix:sum, Args:[%0->$$29]] -- |UNPARTITIONED|
                          subplan {
                                    aggregate [$$29] <- [function-call: asterix:listify, Args:[%0->$$28]] -- |UNPARTITIONED|
                                      assign [$$28] <- [function-call: asterix:field-access-by-name, Args:[%0->$$6, AString: {o_totalprice}]] -- |UNPARTITIONED|
                                        unnest $$6 <- function-call: asterix:scan-collection, Args:[%0->$$26] -- |UNPARTITIONED|
                                          nested tuple source -- |UNPARTITIONED|
                                 } -- |UNPARTITIONED|
                            group by ([$$4 := function-call: asterix:field-access-by-name, Args:[%0->$$5, AString: {o_orderdate}]]) decor ([]) {
                                      aggregate [$$26] <- [function-call: asterix:listify, Args:[%0->$$5]] -- |UNPARTITIONED|
                                        nested tuple source -- |UNPARTITIONED|
                                   } -- |UNPARTITIONED|
                              select (function-call: algebricks:and, Args:[function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-name, Args:[%0->$$5, AString: {o_custkey}], function-call: asterix:field-access-by-name, Args:[%0->$$3, AString: {c_custkey}]], function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-name, Args:[%0->$$3, AString: {c_nationkey}], function-call: asterix:field-access-by-name, Args:[%0->$$0, AString: {n_nationkey}]]]) -- |UNPARTITIONED|
                                unnest $$3 <- function-call: asterix:dataset, Args:[AString: {customers}] -- |UNPARTITIONED|
                                  unnest $$5 <- function-call: asterix:dataset, Args:[AString: {orders}] -- |UNPARTITIONED|
                                    nested tuple source -- |UNPARTITIONED|
             } -- |UNPARTITIONED|
        select (function-call: algebricks:eq, Args:[function-call: asterix:field-access-by-name, Args:[%0->$$0, AString: {n_nationkey}], function-call: asterix:field-access-by-name, Args:[%0->$$1, AString: {sn_nationkey}]]) -- |UNPARTITIONED|
          unnest $$1 <- function-call: asterix:dataset, Args:[AString: {selectedNations}] -- |UNPARTITIONED|
            unnest $$0 <- function-call: asterix:dataset, Args:[AString: {nation}] -- |UNPARTITIONED|
              empty-tuple-source -- |UNPARTITIONED|

Original comment by richb...@gmail.com on 1 Jun 2014 at 5:34

GoogleCodeExporter commented 9 years ago
Are you posting logical plan or optimized logical plan? The latter is the one 
which will be executed.
Can you also attach your datatypes definition?

Original comment by ildar.absalyamov on 1 Jun 2014 at 5:55

GoogleCodeExporter commented 9 years ago
The logical plan. Optimized plan and Hyracks job are not printed even if I 
toggle them.
Datatype defs:

create type nationType as open {
  n_nationkey: int32,
  n_name: string,
  n_regionkey: int32,
  n_comment: string
}

create dataset nation(nationType)
primary key n_nationkey;

create type customersType as open {
  c_custkey: int32,
  c_name: string,
  c_address: string,
  c_nationkey: int32,
  c_phone: string,
  c_acctbal: double,
  c_mktsegment: string,
  c_comment: string
}

create dataset customers(customersType)
primary key c_custkey;

create type ordersType as open {
  o_orderkey: int32,
  o_custkey: int32,
  o_orderstatus: string,
  o_totalprice: double,
  o_orderdate: string,
  o_orderpriority: string,
  o_clerk: string,
  o_shippriority: int32,
  o_comment: string
}

create dataset orders(ordersType)
primary key o_orderkey;

Original comment by richb...@gmail.com on 1 Jun 2014 at 6:00

GoogleCodeExporter commented 9 years ago

Original comment by dtab...@gmail.com on 2 Jun 2014 at 5:09

GoogleCodeExporter commented 9 years ago
Issue 783 has been merged into this issue.

Original comment by buyingyi@gmail.com on 2 Jun 2014 at 9:37

GoogleCodeExporter commented 9 years ago
hyracks: yingyi/fullstack_fix
asterix: yingyi/asterix_test

Original comment by buyingyi@gmail.com on 3 Jun 2014 at 12:36

GoogleCodeExporter commented 9 years ago

Original comment by buyingyi@gmail.com on 9 Jun 2014 at 10:15