pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://tidbcloud.com/free-trial
https://pingcap.com
Apache License 2.0
36.27k stars 5.72k forks source link

`nullif` on two identical subselects produces non null value #52986

Closed r33s3n6 closed 4 days ago

r33s3n6 commented 2 weeks ago

1. Minimal reproduce step (Required)

Firstly, execute init.sql to create the table. Then executing error.sql yields unexpected results. Note that reproducing these results might not be entirely stable. Typically, it can be completed within three attempts. You can try executing error.sql multiple times or execute init.sql again to rebuild the table.

init.sql.txt error.sql.txt

2. What did you expect to see? (Required)

The SQL statement applies nullif to two identical expressions select variance(c_k6tgklu10) from t_rl02, then counts the results. Since the result is NULL, the count should be 0.

3. What did you see instead (Required)

In the multi-node version, the count is 1. output_re_main2.log output_re_single2.log

4. What is your TiDB version? (Required)

Release Version: v8.0.0
Edition: Community
Git Commit Hash: 8ba1fa452b1ccdbfb85879ea94b9254aabba2916
Git Branch: HEAD
UTC Build Time: 2024-03-28 14:22:15
GoVersion: go1.21.4
Race Enabled: false
Check Table Before Drop: false
Store: tikv

topology:

distributed.yaml:

global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/tidb-deploy"
  data_dir: "/tidb-data"

pd_servers:
  - host: 10.0.2.31

tidb_servers:
  - host: 10.0.2.21

tikv_servers:
  - host: 10.0.2.11
  - host: 10.0.2.12
  - host: 10.0.2.13

monitoring_servers:
  - host: 10.0.2.8

grafana_servers:
  - host: 10.0.2.8

alertmanager_servers:
  - host: 10.0.2.8

tiflash_servers:
  - host: 10.0.2.32

single.yaml

global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/tidb-deploy"
  data_dir: "/tidb-data"

pd_servers:
  - host: 10.0.2.73

tidb_servers:
  - host: 10.0.2.72

tikv_servers:
  - host: 10.0.2.71

tiflash_servers:
  - host: 10.0.2.74

about us

We are the BASS team from the School of Cyber Science and Technology at Beihang University. Our main focus is on system software security, operating systems, and program analysis research, as well as the development of automated program testing frameworks for detecting software defects. Using our self-developed database vulnerability testing tool, we have identified the above-mentioned vulnerabilities in TiDB that may lead to database logic error.

yibin87 commented 1 week ago

The error.sql.txt query's plan looks incorrect:

+--------------------------+----------+-----------+-----------------------------------------+--------------------------------+
| id                       | estRows  | task      | access object                           | operator info                  |
+--------------------------+----------+-----------+-----------------------------------------+--------------------------------+
| Projection_47            | 10000.00 | root      |                                         | 1->Column#35                   |
| └─IndexReader_51         | 10000.00 | root      |                                         | index:IndexFullScan_50         |
|   └─IndexFullScan_50     | 10000.00 | cop[tikv] | table:ref_2, index:c_uajclsr(c_uajclsr) | keep order:false, stats:pseudo |
+--------------------------+----------+-----------+-----------------------------------------+--------------------------------+
yibin87 commented 1 week ago

/sig planner

yibin87 commented 1 week ago

/remove-sig execution

time-and-fate commented 5 days ago

The core issue here is the (select variance(c_k6tgklu10) from t_rl02) subquery. Since its output column type is DOUBLE, and tidb executes SQLs with intra-query concurrency, the order of calculations that happen in the query would be unstable, therefore the execution result will be unstable due to the precision loss of the floating number.

> select variance(c_k6tgklu10) from t_rl02;
+-----------------------+
| variance(c_k6tgklu10) |
+-----------------------+
|   0.24595041322314046 |
+-----------------------+
1 row in set (0.00 sec)

> select variance(c_k6tgklu10) from t_rl02;
+-----------------------+
| variance(c_k6tgklu10) |
+-----------------------+
|   0.24595041322314048 |
+-----------------------+
1 row in set (0.00 sec)

> select variance(c_k6tgklu10) from t_rl02;
+-----------------------+
| variance(c_k6tgklu10) |
+-----------------------+
|   0.24595041322314043 |
+-----------------------+
1 row in set (0.00 sec)

That makes the two same subqueries in the SQL may have different results. Then the nullif expression will be 1 sometimes and 0 otherwise. Finally, the query result would be unstable. Most of the time it will be all 1s, but sometimes it would also be 0s.

MySQL document (B.3.4.8 Problems with Floating-Point Values) also has descriptions on this issue and suggestions that resolve this issue by rewriting the SQL.

time-and-fate commented 5 days ago

Currently, we'd like to consider this as the expected behavior and close this issue later. Feel free to comment or reopen if you have further questions.