zhouqingqing / qpmodel

A Relational Optimizer and Executor
MIT License
66 stars 18 forks source link

Remove from 01 #235

Closed pkommoju closed 3 years ago

pkommoju commented 4 years ago

Removing remove_from work This exposed several problems: 1) union in FROM doesn't set outputs of second and subsequent selects which results in accessing null input rows and exception. 2) conflicting output names in the final query. 3) more renamed columns than the output columns. Probably related to (2). 4) Can't bind column e1 to table. 5) table scan having a filter requiring columns of two tables. This works if unnesting is enabled. 6) tpcds/q23.sql. PhysicNode x) Line 88 7) genuine group by expression raising error as non grouping expression. The failing query is: select d1, sum(d2) from (select c1/2, sum(c1) from (select b1, count(*) as a1 from b group by b1)c(c1, c2) group by c1/2) d(d1, d2) group by d1;

This checkin contains fix for (7). Other problems will be resolved one after another as separate fixes.

This pull request is only for code review not an actual merge request.

All tests affected by change of query plan are updated (except) Cardinality test.

The fix is to remember the last position of output required by the parent, it is not an error if the offending occurs after this position.

zhouqingqing commented 4 years ago

6 tests failed - is that expected? -- never mind, I saw that you enabled remove_from by default, so failed tests are expected.

pkommoju commented 4 years ago

Yes they are expected failures. The exact list I know is as follows:

Failing queries: select count(c1), sum(c2) from (select from a union all select from b) c(c1,c2); select from (select from a union all select from b) c(c1,c2) order by 1; select max(c1), min(c2) from(select from(select from a union all select from b) c(c1, c2))d(c1, c2) order by 1;

Reason: union in FROM. Only first select has output set, others produce no output which results in accessing null at execution time.

select b1+c100 from (select count() as b1 from b) a, (select c1 c100 from c) c where c100>1; select a.b1+c.b1 from (select count() as b1 from b) a, (select c1 b1 from c) c where c.b1>1;

Reason: count(*) on scan node. I will fix this today.

select a1,a2,b2 from b, a where a1=b1 and a1 < (select a2 from a where a2=b2);

Reason: conflicting output name c2 is not allowed. It fails only from test framework. It works from command line. I just noticed this difference. It might be because in command line enable_subqueryunnest, enable_cteplan, usememo are set to true, along with remove_from. I think enable_subqueryunnest=true "corrects" the plan, as you said, this "correction" may actually be hiding a bug elsewhere.

select b.a1 + b.a2 from (select a1 from a) b

This is a genuine error. I don't know why it ever worked but setting remove_from=true correctly produces the error "column not exists b.a2"

SELECT e1 FROM (SELECT d1 FROM (SELECT Sum(ab12) FROM (SELECT e1 b2 ab12 FROM (SELECT e1 FROM (SELECT d1 FROM (SELECT Sum(ab12) FROM (SELECT a1 b2 ab12 FROM a JOIN b ON a1 = b1) b) c(d1)) d(e1)) a JOIN b ON e1 = 8*b1) b) c(d1)) d(e1);

Reason: Can't bind column e1 to table. This should work if all derived tables name their columns, as follows: SELECT e1 FROM ( SELECT d1 FROM ( SELECT sum(ab12) FROM ( SELECT e1 b2 ab12 FROM ( SELECT e1 FROM ( SELECT d1 FROM ( SELECT sum(ab12) FROM ( SELECT a1 b2 ab12 FROM a JOIN b ON a1 = b1) b(ab12)) c(d1)) d(e1)) a(e1) JOIN b ON e1 = 8*b1) b(ab12)) c(d1)) d(e1);

But this also fails because it triggers the assertion "Debug.Assert(!list.Contains(null));" in reqlistGetAggrRefs method.

select a1 from c,a, b where a1=b1 and b2=c2 and a.a1 = (select b1 from(select b_2.b1, b_1.b2, b_1.b3 from b b_1, b b_2) bo where b2 = a2 and b1 = (select b1 from b where b3 = a3 and bo.b3 = c3 and b3> 1) and b2<5) and a.a2 = (select b2 from b bo where b1 = a1 and b2 = (select b2 from b where b4 = a3 + 1 and bo.b3 = a3 and b3> 0) and c3<5);

Minimal repro which has no result set: select b1 from(select b_2.b1, b_1.b2, b_1.b3 from b b_1, b b_2) bo(b1, b2, b3) where b2 = b3 and b1 = (select b1 from b where b3 = bo.b3 and bo.b3 = b3 and b3> 1);

Reason: Bad plan. A join filter is needed on b_1 and b2 but without unnesting enabled the plan has non-join filter on b2, requiring columns from b_1 and b__2.

tpcds/q23.sql. I haven't looked into exact reason this is failing. It triggers the assert "Debug.Assert(!mustHaveOutput.Contains(phyType) || VisitEachExists(x => x is PhysicNLJoin));" in PhysicNode, line #88

zhouqingqing commented 3 years ago

Should I comment out the tests known to fail in remove_from_01 branch

By default, we shall pass all tests at merge time.