luooofan / miniob-2023

2023 OceanBase 数据库大赛初赛
https://open.oceanbase.com/train?questionId=600004
Mulan Permissive Software License, Version 2
78 stars 34 forks source link

[2023 Case Failed]: aggregation-func & group-by & complex-sub-query #38

Closed luooofan closed 1 year ago

luooofan commented 1 year ago

base on:

description: 先做的 aggr func,没有 group by,这个时候对空数据行的聚集函数运算结果应该是 0 或者 null

select count(id) from exp_table where 6/1*6 < 7+col3*col3/2;
- 0
-- below are some requests executed before(partial) --
-- init data
create table exp_table(id int, col1 int, col2 int, col3 float, col4 float);
create table exp_table2(id int, col1 int);
insert into exp_table VALUES (6, 9, 1, 4.93, 8.98);
insert into exp_table VALUES (3, 2, 6, 2.05, 1.24);
insert into exp_table VALUES (7, 4, 9, 3.32, 8.98);
...

后来有一次提测 group by 挂了:

select id, sum(score) from t_group_by where id>9 group by id;
+ | NULL
-- below are some requests executed before(partial) --
-- init data
create table t_group_by (id int not null, score float not null, name char(1) null);
create table t_group_by_2 (id int not null, age int not null);
insert into t_group_by VALUES (4, 3.40, 'U');
insert into t_group_by VALUES (3, 4.06, 'V');
insert into t_group_by VALUES (4, 3.11, 'F');
...

整理后

create table t_group_by (id int not null, score float not null, name char(1) null);
create table t_group_by_2 (id int not null, age int not null);
insert into t_group_by VALUES (4, 3.40, 'U');
insert into t_group_by VALUES (3, 4.06, 'V');
select id, sum(score) from t_group_by where id>9 group by id;
create table t1(c1 int primary key, c2 int);
select count(c2) from t1 where c1 > 2 group by c1;
select max(c1) from t1 where c1 > 2 group by c1;
select c1, max(c1) from t1 where c1 > 2 group by c1;
select c1, max(c2) from t1 where c1 > 2 group by c1;
insert into t1 values(1,2),(2,3),(3,4),(0,1),(4,NULL),(5,NULL),(6,NULL);
select count(c2) from t1 where c1 > 2 group by c1;
select max(c1) from t1 where c1 > 2 group by c1;
select c1, max(c1) from t1 where c1 > 2 group by c1;
select c1, max(c2) from t1 where c1 > 2 group by c1;
select count(c2) from t1 where c1 > 20 group by c1;
select max(c1) from t1 where c1 > 20 group by c1;
select c1, max(c1) from t1 where c1 > 20 group by c1;
select c1, max(c2) from t1 where c1 > 20 group by c1;

与 mysql 对比测试发现,带 group by 的情况下,空数据行应该不返回结果

但我们直接把它改成了空数据行的情况下返回 RECORD_EOF

然后出现了新的问题:

select * from csq_1 where feat1 <> (select min(csq_2.feat2) from csq_2 where csq_2.feat2 > csq_1.feat1);
- 61 | 35 | 8.64
- 68 | 37 | 63.64
- 85 | 79 | 61.71
- 98 | 25 | 58.94
-- below are some requests executed before(partial) --
-- init data
CREATE TABLE csq_1(id int, col1 int, feat1 float);
CREATE TABLE csq_2(id int, col2 int, feat2 float);
CREATE TABLE csq_3(id int, col3 int, feat3 float);
CREATE TABLE csq_4(id int, col4 int, feat4 float);
INSERT INTO csq_1 VALUES (96, 94, 86.30);
...

由于子查询中聚集函数对空数据行进行计算直接返回 RECORD_EOF,父查询处理谓词的时候直接向上返回 RECORD_EOF,所以少处理了一部分数据

image

luooofan commented 1 year ago

总结:

luooofan commented 1 year ago

https://github.com/luooofan/miniob-2023/blob/19ab2fb04a9089c016bd489883ebd5385206f276/src/observer/sql/operator/groupby_physical_operator.cpp#L48-L66

https://github.com/luooofan/miniob-2023/blob/19ab2fb04a9089c016bd489883ebd5385206f276/src/observer/sql/expr/expression.cpp#L245-L260