Open r33s3n6 opened 6 months ago
set tidb_max_chunk_size = 20000;
Or wait for 1 minute for the modified row count to be dumped into the table.
Not sure why, but it's only reproducible when the Concurrency
is OFF
for Projection_37
, which corresponds to numWorkers
in the code.
+--------------------------------------+---------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+---------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+--------------------------------------+---------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+---------+
| Union_31 | 104.00 | 104 | root | | time:4.35s, loops:7, RU:2.888208 | | N/A | N/A |
| ├─Projection_34 | 20.00 | 20 | root | | time:4.34s, loops:2, Concurrency:OFF | 0->Column#35, test2.t_a97s.c_k2l->Column#36, test2.t_a97s.c_csldc51zsu->Column#37, test2.t_a97s.c_csldc51zsu->Column#38 | 1016 Bytes | N/A |
| │ └─TableReader_36 | 20.00 | 20 | root | | time:379.5µs, loops:2, cop_task: {num: 1, max: 235.4µs, proc_keys: 0, copr_cache_hit_ratio: 0.00, build_task_duration: 8.95µs, max_distsql_concurrency: 1}, rpc_info:{Cop:{num_rpc:1, total_time:209.9µs}} | data:TableFullScan_35 | 639 Bytes | N/A |
| │ └─TableFullScan_35 | 20.00 | 20 | cop[tikv] | table:ref_0 | tikv_task:{time:178µs, loops:0} | keep order:false, stats:pseudo | N/A | N/A |
| └─Projection_37 | 84.00 | 84 | root | | time:4.34s, loops:6, Concurrency:OFF | Column#19->Column#35, test2.t_xqspe7yna.c_ud2f5a6b->Column#36, Column#26->Column#37, cast(Column#30, text BINARY CHARACTER SET utf8mb4 COLLATE utf8mb4_bin)->Column#38 | 2.73 KB | N/A |
| └─Shuffle_44 | 84.00 | 84 | root | | time:4.34s, loops:6, ShuffleConcurrency:5 | execution info: concurrency:5, data sources:[Projection_40] | N/A | N/A |
| └─Window_38 | 84.00 | 84 | root | | time:21.7s, loops:10 | count(Column#29)->Column#30 over(partition by test2.t_xqspe7yna.c_j20ga) | N/A | N/A |
| └─Sort_43 | 84.00 | 84 | root | | time:21.7s, loops:10 | test2.t_xqspe7yna.c_j20ga | 2.51 KB | 0 Bytes |
| └─ShuffleReceiver_45 | 84.00 | 84 | root | | time:21.7s, loops:10 | | N/A | N/A |
| └─Projection_40 | 84.00 | 84 | root | | time:4.34s, loops:2, Concurrency:OFF | 1->Column#19, test2.t_xqspe7yna.c_ud2f5a6b, <nil>->Column#26, test2.t_xqspe7yna.c_j20ga, cast(test2.t_xqspe7yna.c_ud2f5a6b, bigint(22) BINARY)->Column#29 | 2.76 KB | N/A |
| └─TableReader_42 | 84.00 | 84 | root | | time:4.34s, loops:2, cop_task: {num: 4, max: 185.8µs, min: 142.8µs, avg: 166.4µs, p95: 185.8µs, copr_cache_hit_ratio: 0.00, build_task_duration: 11.6µs, max_distsql_concurrency: 4}, rpc_info:{Cop:{num_rpc:4, total_time:544µs}} | data:TableFullScan_41 | 2.38 KB | N/A |
| └─TableFullScan_41 | 84.00 | 84 | cop[tikv] | table:ref_1 | tikv_task:{proc max:134.9µs, min:99.9µs, avg: 115.3µs, p80:134.9µs, p95:134.9µs, iters:0, tasks:4} | keep order:false, stats:pseudo | N/A | N/A |
+--------------------------------------+---------+---------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+---------+
And the numWorkers
is decided by the logic as follows:
https://github.com/pingcap/tidb/blob/b1818cd05f2be7a5410449de64c8f422cedbc311/pkg/executor/builder.go#L1779-L1797
In brief, it's mainly related to the Chunk
. Two columns in one Chunk
incorrectly share the same *Column
, then the two columns always have the same data. The 3rd and 4th (start from 1) columns in the result set are where the incorrect sharing happen.
The detailed process is as follows:
Projection_34
produces 4 columns. You can notice that the 3rd and 4th columns are from the same column of its children.Projection_34
outputs a Chunk
with 4 columns with the 3rd and 4th *Column
pointing to the same Column
.Union_31
receives the Chunk
and outputs the same Chunk
.(*clientConn).writeChunks
receives the Chunk
, returns the content to the customer, and calls Next()
again with this Chunk
.Union_31
receives the same Chunk
as req
input parameter in (*UnionExec).Next()
UnionExec
has a Chunk
reusing mechanism (see UnionExec.resourcePools
). It then reuses this Chunk
to call the child executor's Next()
.Projection_37
receives this Chunk
and starts expression evaluation. But it actually writes the evaluation result of the 3rd and 4th expressions to the same Column
. And the result of the 4th expression overwrites the 3rd expression.I believe we need to check and disable the wrong Column
reusing at one of the places above.
Similar to issue #52985 , and lower severity to major.
/remove-severity critical
/severity major
1. Minimal reproduce step (Required)
Firstly, execute
init.sql
to create the table. Then executingerror.sql
yields unexpected results. Note that reproducing these results might not be entirely stable. Typically, it can be completed within three attempts. You can try executingerror.sql
multiple times or executeinit.sql
again to rebuild the table. init.sql.txt error.sql.txtThis problem is not easy to reproduce, but I have managed to reproduceit successfully several times. Sometimes, it requires waiting for a while after creating the table before repeatedly executing
error.sql
.2. What did you expect to see? (Required)
When the result in the first column is 1, it means it comes from the second select query, where the third column is a subquery. When executed separately, the result of the subquery is NULL.
Therefore, when the first column is 1, the third column should always be NULL.
3. What did you see instead (Required)
However, there are some cases where the column is not NULL, and there is inconsistency between the TiDB single-node version and the multi-node version.
output_re_main2.log output_re_single2.log
4. What is your TiDB version? (Required)
topology:
distributed.yaml:
single.yaml
about us
We are the BASS team from the School of Cyber Science and Technology at Beihang University. Our main focus is on system software security, operating systems, and program analysis research, as well as the development of automated program testing frameworks for detecting software defects. Using our self-developed database vulnerability testing tool, we have identified the above-mentioned vulnerabilities in TiDB that may lead to database logic error.