verdict-project / verdict

Interactive-Speed Analytics: 200x Faster, 200x Fewer Cluster Resources, Approximate Query Processing
http://verdictdb.org
Apache License 2.0
249 stars 66 forks source link

Aggregations on scrambled table are not performed correctly #234

Closed dongyoungy closed 6 years ago

dongyoungy commented 6 years ago

If you have orders_scramble scrambled table for orders table and run the following simple aggregation query:

SELECT AVG(o_totalprice) as avg_price
FROM orders

VerdictDB executes the following query:

select sum(verdictdb_internal_tier_consolidated."agg0") as "agg0", 
sum(verdictdb_internal_tier_consolidated."agg1") as "agg1" from 
(select 1.0 * verdictdb_internal_before_scaling."agg0" as "agg0", 1.0 * 
verdictdb_internal_before_scaling."agg1" as "agg1", 
verdictdb_internal_before_scaling."verdictdb_tier_alias_724271_0" as "verdictdb_tier_alias_724271_0" 
from 
(select sum(vt1."o_totalprice") as "agg0", count(*) as "agg1", vt1."verdictdbtier" as "verdictdb_tier_alias_724271_0" 
from "verdictdb_tpch_query_test_lmyo1acm"."orders_scramble" as vt1 
where vt1."verdictdbblock" = 0 group by vt1."verdictdbtier") as verdictdb_internal_before_scaling) as verdictdb_internal_tier_consolidated

I think it is supposed to be something like agg0 / agg1 to calculate the average after scaling in the select list of the outer-most query, but the transformation is not done properly.

More specifically, I am suspecting that the cause is AsyncAggExecutionNode::replaceWithOriginalSelectList() method, which seems to be responsible for such transformation, not being called in the current control flow when a user runs query by:

  1. instantiate VerdictConnection just like other JDBC connection
  2. create statement with createStatement() and run the query

A simple solution (not tested, just by inspection) seems to be adding something like the following in the static AsyncAggExecutionNode:create() method:

node.selectQuery = replaceWithOriginalSelectList(node.selectQuery, sourceAggMeta);
pyongjoo commented 6 years ago

I'm not sure if this is actually a bug. We already have avg queries in our test cases and their answers are compared against the regular ones. If that transformation does not happen, the test cases should have failed.

On Sun, Aug 26, 2018 at 8:25 PM Dong Young Yoon notifications@github.com wrote:

If you have orders_scramble scrambled table for orders table and run the following simple aggregation query:

SELECT AVG(o_totalprice) as avg_priceFROM orders

VerdictDB executes the following query:

select sum(verdictdb_internal_tier_consolidated."agg0") as "agg0", sum(verdictdb_internal_tier_consolidated."agg1") as "agg1" from (select 1.0 verdictdb_internal_before_scaling."agg0" as "agg0", 1.0 verdictdb_internal_before_scaling."agg1" as "agg1", verdictdb_internal_before_scaling."verdictdb_tier_alias_724271_0" as "verdictdb_tier_alias_724271_0" from (select sum(vt1."o_totalprice") as "agg0", count(*) as "agg1", vt1."verdictdbtier" as "verdictdb_tier_alias_724271_0" from "verdictdb_tpch_query_test_lmyo1acm"."orders_scramble" as vt1 where vt1."verdictdbblock" = 0 group by vt1."verdictdbtier") as verdictdb_internal_before_scaling) as verdictdb_internal_tier_consolidated

I think it is supposed to be something like agg0 / agg1 to calculate the average after scaling in the select list of the outer-most query, but the transformation is not done properly.

More specifically, I am suspecting that the cause is AsyncAggExecutionNode::replaceWithOriginalSelectList() method, which seems to be responsible for such transformation, not being called in the current control flow when a user runs query by:

  1. instantiate VerdictConnection just like other JDBC connection
  2. create statement with createStatement() and run the query

A simple solution (not tested, just by inspection) seems to be adding something like the following in the static AsyncAggExecutionNode:create() method:

node.selectQuery = replaceWithOriginalSelectList(node.selectQuery, sourceAggMeta);

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mozafari/verdictdb/issues/234, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQYCtbNcQ0-zbMkRfR4O3XJcJP4g5iLks5uUzxXgaJpZM4WNA2h .

-- Yongjoo Park, Ph.D. Research Fellow Computer Science and Engineering University of Michigan 2260 Hayward St. Ann Arbor, MI 48109-2121 Office: 4957 Beyster Phone: (734) 707-9206 Website: yongjoopark.com

dongyoungy commented 6 years ago

I created a local branch from the current master to verify this issue and I think it is caused by the simplification logic.

Please try adding this unit test and run the case runSimpleAggQueryTest()

You will see the java.lang.ArrayIndexOutOfBoundsException: 0 with simplify2() method in place under SelectQueryCoordinator (I think I mentioned about this before) and it runs fine without it.

I guess what I did for fixing the simplify2() method locally was not a proper fix and caused the transformation problem. It would be great if you can look into the simplify2() method regarding this problem.

pyongjoo commented 6 years ago

What was the main part you changed in the simplify2() method?

dongyoungy commented 6 years ago

To fix the java.lang.ArrayIndexOutOfBoundsException, the change I have made was to change the following method in DirectRetrievalExecutionNode:

 @Override
  public SqlConvertible createQuery(List<ExecutionInfoToken> tokens) throws VerdictDBException {
    // this will replace the placeholders contained the subquery.
    childNode.createQuery(tokens);
    parentNode.createQuery(tokens);
    return selectQuery;
  }

to something like:

 @Override
  public SqlConvertible createQuery(List<ExecutionInfoToken> tokens) throws VerdictDBException {
    // this will replace the placeholders contained the subquery.
    if (!tokens.isEmpty()) {
      childNode.createQuery(tokens);
    }
    parentNode.createQuery(tokens);
    return selectQuery;
  }

s.t. it only calls createQuery on the childNode when there is non-empty tokens. However, please note again that this seems to cause the transformation error in this issue.

pyongjoo commented 6 years ago

I didn't ask what you are trying to do to fix the issue. My question was what caused the issue originally. You mentioned, "what I did for fixing the simplify2() method locally was not a proper fix and caused the transformation problem".

dongyoungy commented 6 years ago

That's the question that I do not have an answer to. I just tried to fix the symptom as it appeared and that was it.

pyongjoo commented 6 years ago

Let me run the unit tests after https://github.com/mozafari/verdictdb/issues/236 is fixed. The basic scrambling is broken right at the moment.

pyongjoo commented 6 years ago

I encountered the same error when I ran RedshiftTest.runSimpleAggQueryTest(). Let me try to fix the bug.

pyongjoo commented 6 years ago

The error was thrown when the scrambled table includes only one block due to the bug in simplification logic. I have pushed a fix to my branch, which I will merge after tests pass in CI.

dongyoungy commented 6 years ago

Sure. Once it gets merged, I will continue with #228

pyongjoo commented 6 years ago

The related pull request (https://github.com/mozafari/verdictdb/pull/237) has been merged.