pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.8k stars 5.8k forks source link

get wrong query result when use self-define rank #6834

Closed anjieych closed 5 years ago

anjieych commented 6 years ago

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    
    create table a (f1 int ,f2 varchar(32) ,primary key (f1));

insert into a(f1,f2) values(1,'a'), (2,'b'), (3,'d'), (4,'d'), (5,'e'), (6,'f'), (7,'g'), (8,'h'), (9,'i'), (10,'j');

CREATE TABLE b ( k INT NOT NULL AUTO_INCREMENT, k1 INT NOT NULL, k2 VARCHAR(45) NULL, PRIMARY KEY (k));

insert into b(k1,k2) values(1,'1-1'), (1,'1-2'), (2,'2'), (3,'3'), (4,'4'), (5,'5'), (100,'100');

select t.f1,t.f2,t.k2,t.rank from ( select a.f1,a.f2,b.k1,b.k2 ,case when @cur=a.f1 then @rank:=@rank else @rank:=@rank+1 end as rank ,@cur:=a.f1 as lst from a inner join b on a.f1=b.k1 ,(select @cur:=0,@rank:=0) as c order by f1 asc ) as t where t.rank between 1 and 3;


2. What did you expect to see?

get correct return:

f1 f2 k2 rank 1 a 1-1 1 1 a 1-2 1 2 b 2 2 3 d 3 3


3. What did you see instead?

get wrong return:

f1 f2 k2 rank 1 a 1-1 21 1 a 1-2 21


4. What version of TiDB are you using (`tidb-server -V` or run `select tidb_version();` on TiDB)?

Release Version: v2.0.0 Git Commit Hash: 637e130e6a9ba2e54e158131c0466233db39a60e Git Branch: release-2.0 UTC Build Time: 2018-04-27 11:43:00 GoVersion: go version go1.10 linux/amd64 TiKV Min Version: 2.0.0-rc.4.1

zimulala commented 6 years ago

@anjieych Thanks for your report! We'll fix it soon.

zz-jason commented 6 years ago

@anjieych It's hard for tidb to be compatible with mysql on the behavior of user variables, for this case, use window function dense_rank is a better chose:

SELECT
    t.f1,
    t.f2,
    t.k2,
    t.r
FROM (
    SELECT
        a.f1 AS f1,
        a.f2 AS f2,
        b.k2 AS k2,
        DENSE_RANK() OVER(ORDER BY a.f1) AS r
    FROM
        a
    INNER JOIN
        b
    ON
        a.f1=b.k1
) AS t
WHERE
    t.r between 1 and 3;

You can try it in mysql 8.0, and we are planned to support window function as well, see our roadmap for more detail.

anjieych commented 6 years ago

@zz-jason Thanks for your quick response. I can try another way to implement the finall purpose,but my trouble is that why the tidb return with two rows with rank of 21 ?

zz-jason commented 6 years ago

The execution plan for this sql in tidb is:

TiDB(localhost:4000) > desc select t.f1,t.f2,t.k2,t.rank from ( select a.f1,a.f2,b.k1,b.k2 ,case when @cur=a.f1 then @rank:=@rank else @rank:=@rank+1 end as rank ,@cur:=a.f1 as lst from a inner join b on a.f1=b.k1 ,(select @cur:=0,@rank:=0) as c order by f1 asc ) as t where t.rank between 1 and 3;
+------------------+------------------+--------------------------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| id               | parents          | children                       | task | operator info                                                                                                                                                                                                                                                                  | count    |
+------------------+------------------+--------------------------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| TableScan_21     |                  |                                | cop  | table:a, range:[-inf,+inf], keep order:false                                                                                                                                                                                                                                   | 10000.00 |
| TableReader_22   | Selection_20     |                                | root | data:TableScan_21                                                                                                                                                                                                                                                              | 10000.00 |
| Selection_20     | HashRightJoin_19 | TableReader_22                 | root | ge(cast(case(eq(cast(getvar(cur)), cast(test.a.f1)), setvar(rank, getvar(rank)), setvar(rank, cast(plus(cast(getvar(rank)), 1))))), 1), le(cast(case(eq(cast(getvar(cur)), cast(test.a.f1)), setvar(rank, getvar(rank)), setvar(rank, cast(plus(cast(getvar(rank)), 1))))), 3) | 8000.00  |
| TableScan_23     |                  |                                | cop  | table:b, range:[-inf,+inf], keep order:false                                                                                                                                                                                                                                   | 10000.00 |
| TableReader_24   | HashRightJoin_19 |                                | root | data:TableScan_23                                                                                                                                                                                                                                                              | 10000.00 |
| HashRightJoin_19 | HashLeftJoin_16  | Selection_20,TableReader_24    | root | inner join, inner:Selection_20, equal:[eq(test.a.f1, test.b.k1)]                                                                                                                                                                                                               | 10000.00 |
| TableDual_26     | Projection_25    |                                | root | rows:1                                                                                                                                                                                                                                                                         | 1.00     |
| Projection_25    | HashLeftJoin_16  | TableDual_26                   | root | setvar(cur, 0), setvar(rank, 0)                                                                                                                                                                                                                                                | 1.00     |
| HashLeftJoin_16  | Projection_15    | HashRightJoin_19,Projection_25 | root | inner join, inner:Projection_25                                                                                                                                                                                                                                                | 10000.00 |
| Projection_15    | Sort_13          | HashLeftJoin_16                | root | test.a.f1, test.a.f2, test.b.k2, case(eq(cast(getvar(cur)), cast(test.a.f1)), setvar(rank, getvar(rank)), setvar(rank, cast(plus(cast(getvar(rank)), 1)))), setvar(cur, cast(test.a.f1))                                                                                       | 10000.00 |
| Sort_13          | Projection_12    | Projection_15                  | root | t.f1:asc                                                                                                                                                                                                                                                                       | 10000.00 |
| Projection_12    |                  | Sort_13                        | root | t.f1, t.f2, t.k2, t.rank                                                                                                                                                                                                                                                       | 10000.00 |
+------------------+------------------+--------------------------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
12 rows in set (0.00 sec)

plan

The filter t.rank between 1 and 3 is pushed down in tidb and is evaluated in Selection_20, that is to say, we calculate the filter case when @cur=a.f1 then @rank:=@rank else @rank:=@rank+1 end between 1 and 3 directly on the table a before the join operation.

If we disable the predicate push down optimization rule, we can get the right result:

TiDB(localhost:4000) > desc select t.f1,t.f2,t.k2,t.rank from ( select a.f1,a.f2,b.k1,b.k2 ,case when @cur=a.f1 then @rank:=@rank else @rank:=@rank+1 end as rank ,@cur:=a.f1 as lst from a inner join b on a.f1=b.k1 ,(select @cur:=0,@rank:=0) as c order by f1 asc ) as t where t.rank between 1 and 3;
+-----------------+-----------------+-------------------------------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| id              | parents         | children                      | task | operator info                                                                                                                                                                            | count    |
+-----------------+-----------------+-------------------------------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| TableScan_20    |                 |                               | cop  | table:a, range:[-inf,+inf], keep order:false                                                                                                                                             | 10000.00 |
| TableReader_21  | HashLeftJoin_18 |                               | root | data:TableScan_20                                                                                                                                                                        | 10000.00 |
| TableScan_22    |                 |                               | cop  | table:b, range:[-inf,+inf], keep order:false                                                                                                                                             | 10000.00 |
| TableReader_23  | HashLeftJoin_18 |                               | root | data:TableScan_22                                                                                                                                                                        | 10000.00 |
| HashLeftJoin_18 | HashLeftJoin_16 | TableReader_21,TableReader_23 | root | inner join, inner:TableReader_23, equal:[eq(test.a.f1, test.b.k1)]                                                                                                                       | 12500.00 |
| TableDual_25    | Projection_24   |                               | root | rows:1                                                                                                                                                                                   | 1.00     |
| Projection_24   | HashLeftJoin_16 | TableDual_25                  | root | setvar(cur, 0), setvar(rank, 0)                                                                                                                                                          | 1.00     |
| HashLeftJoin_16 | Projection_15   | HashLeftJoin_18,Projection_24 | root | inner join, inner:Projection_24                                                                                                                                                          | 12500.00 |
| Projection_15   | Sort_13         | HashLeftJoin_16               | root | test.a.f1, test.a.f2, test.b.k2, case(eq(cast(getvar(cur)), cast(test.a.f1)), setvar(rank, getvar(rank)), setvar(rank, cast(plus(cast(getvar(rank)), 1)))), setvar(cur, cast(test.a.f1)) | 12500.00 |
| Sort_13         | Selection_12    | Projection_15                 | root | t.f1:asc                                                                                                                                                                                 | 12500.00 |
| Selection_12    | Projection_11   | Sort_13                       | root | ge(cast(t.rank), 1), le(cast(t.rank), 3)                                                                                                                                                 | 10000.00 |
| Projection_11   |                 | Selection_12                  | root | t.f1, t.f2, t.k2, t.rank                                                                                                                                                                 | 10000.00 |
+-----------------+-----------------+-------------------------------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
12 rows in set (0.00 sec)

plan2

In the second plan, we will calculate the filter after the join operation and get the right result.

I think for predicate push down, we can just leave the getvar/setvar, just push the filters which do not contain these two special builtin functions.

winoros commented 6 years ago

The user defined variable occurred in selection is undefined behavior referring to ANSI SQL. So label this issue by compatibility instead of bug.

anjieych commented 6 years ago

@zz-jason Thanks again for your quick response.It's the same to what i guess.So what will you do,keep it or make it right ?

zz-jason commented 6 years ago

@anjieych From mysql documentation 5.7:

However, the order of evaluation for expressions involving user variables is undefined.

For now, I think we'd better just "keep it".

anjieych commented 6 years ago

@zz-jason Ok

alivxxx commented 5 years ago

@anjieych You could try the latest master for window function. Feel free to reopen this issue.