pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.91k stars 5.81k forks source link

TiKV and TiFlash behave differently when using `right` function with negative number and type conversion #53894

Open r33s3n6 opened 3 months ago

r33s3n6 commented 3 months ago

1. Minimal reproduce step (Required)

create table t1 (c1 text, c2 int);
alter table t1 set tiflash replica 1;
insert into t1 (c1,c2) values ('k5tis', -65), ('x', 1), ('jp2hmweyyb', 1);

select /*+ read_from_storage(tiflash[t1]) */ right(c1 , cast(c2 as unsigned)) from t1;

2. What did you expect to see? (Required)

mysql> select /*+ read_from_storage(tikv[t1]) */ right(c1 , cast(c2 as unsigned)) from t1;
+----------------------------------+
| right(c1 , cast(c2 as unsigned)) |
+----------------------------------+
|                                  |
| x                                |
| b                                |
+----------------------------------+
3 rows in set (0.00 sec)

mysql> select right('abcde', cast(-65 as unsigned));
+---------------------------------------+
| right('abcde', cast(-65 as unsigned)) |
+---------------------------------------+
|                                       |
+---------------------------------------+
1 row in set (0.00 sec)

3. What did you see instead (Required)

mysql> select /*+ read_from_storage(tiflash[t1]) */ right(c1 , cast(c2 as unsigned)) from t1;
+----------------------------------+
| right(c1 , cast(c2 as unsigned)) |
+----------------------------------+
| k5tis                            |
| x                                |
| b                                |
+----------------------------------+
3 rows in set (0.01 sec)

4. What is your TiDB version? (Required)

Release Version: v8.2.0-alpha-292-g7629a0d
Edition: Community
Git Commit Hash: 7629a0d68595c4d11c25fc059208f3efd838c2c1
Git Branch: HEAD
UTC Build Time: 2024-06-04 03:27:25
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: tikv

topology:

distributed.yaml:

global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/tidb-deploy"
  data_dir: "/tidb-data"

pd_servers:
  - host: 10.0.2.31

tidb_servers:
  - host: 10.0.2.21

tikv_servers:
  - host: 10.0.2.11
  - host: 10.0.2.12
  - host: 10.0.2.13

monitoring_servers:
  - host: 10.0.2.8

grafana_servers:
  - host: 10.0.2.8

alertmanager_servers:
  - host: 10.0.2.8

tiflash_servers:
  - host: 10.0.2.32

about us

We are the BASS team from the School of Cyber Science and Technology at Beihang University. Our main focus is on system software security, operating systems, and program analysis research, as well as the development of automated program testing frameworks for detecting software defects. Using our self-developed database vulnerability testing tool, we have identified the above-mentioned vulnerabilities in TiDB that may lead to database logic error.

zanmato1984 commented 3 months ago

Reproduction simplified (note that -1 and 18446744073709551551 have the same binary representation):

create table t2(c text);
alter table t2 set tiflash replica 1;
insert into t2 values('abcd');

select /*+ read_from_storage(tikv[t2]) */ right(c, -1) from t2;
+--------------+
| right(c, -1) |
+--------------+
|              |
+--------------+
-- Correct

select /*+ read_from_storage(tiflash[t2]) */ right(c, -1) from t2;
+--------------+
| right(c, -1) |
+--------------+
|              |
+--------------+
-- Correct

select /*+ read_from_storage(tikv[t2]) */ right(c, 18446744073709551551) from t2;
+--------------------------------+
| right(c, 18446744073709551551) |
+--------------------------------+
|                                |
+--------------------------------+
-- Wrong

select /*+ read_from_storage(tiflash[t2]) */ right(c, 18446744073709551551) from t2;
+--------------------------------+
| right(c, 18446744073709551551) |
+--------------------------------+
| abcd                           |
+--------------------------------+
-- Correct

select /*+ read_from_storage(tikv[t2]) */ right(c, cast(-1 as unsigned)) from t2;
+--------------------------------+
| right(c, cast(-1 as unsigned)) |
+--------------------------------+
|                                |
+--------------------------------+
-- Wrong

select /*+ read_from_storage(tiflash[t2]) */ right(c, cast(-1 as unsigned)) from t2;
+--------------------------------+
| right(c, cast(-1 as unsigned)) |
+--------------------------------+
| abcd                           |
+--------------------------------+
-- Correct

tiflash result is correct for both signed and unsigned second argument. tikv result (it is not pushed down so it is actually tidb's own implementation) is wrong when the second argument's msb is 1 (negative when being interpreted as a signed integer) no matter the parameter itself is signed or unsigned.

The reason is tidb always evaluates the second argument as a signed integer: https://github.com/pingcap/tidb/blob/7a18952eb1478ab459f4c9d6c35c741c85c3108c/pkg/expression/builtin_string.go#L597

We should interpret the second argument based on the unsigned flag in its type info as MySQL does: https://github.com/mysql/mysql-server/blob/824e2b4064053f7daf17d7f3f84b7a3ed92e5fb4/sql/item_strfunc.cc#L1514

This is a relative edge case so lowering the severity to major.