Open r33s3n6 opened 2 weeks ago
It seems to be a tiflash bug:
When use disabled tiflash, it output is all the right:
TiDB root@127.0.0.1:test> set @@tidb_allow_mpp=0;
Query OK, 0 rows affected
Time: 0.000s
TiDB root@127.0.0.1:test> explain SELECT DISTINCT
->
-> substring(
-> cast(repeat(
-> cast(ref_4.c_bek45hvu8g as char),
-> 9) as char),
-> cast(-10000 as signed)) as c2
-> FROM
-> t_m1i as ref_4;
+-----------------------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-----------------------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| HashAgg_4 | 163.00 | root | | group by:Column#6, funcs:firstrow(Column#5)->Column#3 |
| └─Projection_13 | 504.00 | root | | substring(cast(repeat(cast(test.t_m1i.c_bek45hvu8g, var_string(5)), 9), var_string(5)), -10000)->Column#5, substring(cast(repeat(cast(test.t_m1i.c_bek45hvu8g, var_string(5)), 9), var_string(5)), -10000)->Column#6 |
| └─TableReader_9 | 504.00 | root | | data:TableFullScan_7 |
| └─TableFullScan_7 | 504.00 | cop[tikv] | table:ref_4 | keep order:false |
+-----------------------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set
Time: 0.003s
TiDB root@127.0.0.1:test> SELECT DISTINCT
->
-> substring(
-> cast(repeat(
-> cast(ref_4.c_bek45hvu8g as char),
-> 9) as char),
-> cast(-10000 as signed)) as c2
-> FROM
-> t_m1i as ref_4
-> ;
+--------+
| c2 |
+--------+
| |
| <null> |
+--------+
2 rows in set
Then I set @@tidb_allow_mpp
back to ON
:
TiDB root@127.0.0.1:test> set @@tidb_allow_mpp=1;
Query OK, 0 rows affected
Time: 0.001s
TiDB root@127.0.0.1:test> SELECT DISTINCT
->
-> substring(
-> cast(repeat(
-> cast(ref_4.c_bek45hvu8g as char),
-> 9) as char),
-> cast(-10000 as signed)) as c2
-> FROM
-> t_m1i as ref_4
-> ;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| c2 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| <null> |
| cfffonqcfffonqcfffugugugugugugugugugne9d80xne9d80xne9d80xne9d80xne9d80xne9d80xne9d80xne9d80xne9d80xcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxudududududududududmmmmmmmmmjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqzzzzzzzzzgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvnp6_qnp6_qnp6_qnp6_qnp6_qnp6_qnp6_qnp6_qnp6_qojbi0xojbi0xojbi0xojbi0xojbi0xojbi0xojbi0xojbi0xojbi0x... |
| |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set
Time: 0.044s
@zanmato1984 PTAL
A more simple SQL to reproduce it:
TiDB root@127.0.0.1:test> DESC SELECT c_bek45hvu8g,substring(repeat(c_bek45hvu8g, 9), -10000) as c FROM t_m1i;
+------------------------+---------+--------------+---------------+------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------+---------+--------------+---------------+------------------------------------------------------------------------------------------+
| TableReader_12 | 504.00 | root | | MppVersion: 2, data:ExchangeSender_11 |
| └─ExchangeSender_11 | 504.00 | mpp[tiflash] | | ExchangeType: PassThrough |
| └─Projection_4 | 504.00 | mpp[tiflash] | | test.t_m1i.c_bek45hvu8g, substring(repeat(test.t_m1i.c_bek45hvu8g, 9), -10000)->Column#3 |
| └─TableFullScan_10 | 504.00 | mpp[tiflash] | table:t_m1i | keep order:false |
+------------------------+---------+--------------+---------------+------------------------------------------------------------------------------------------+
4 rows in set
Time: 0.003s
Seems the substring
implementation in tiflash doesn't respect the pos
being negative.
However I think this is a pretty minor usage so I'm adjusting the severity to major.
1. Minimal reproduce step (Required)
Firstly, execute
init.sql
to create the table. Then executingerror.sql
yields unexpected results. Note that reproducing these results might not be entirely stable. Typically, it can be completed within three attempts. You can try executingerror.sql
multiple times or executeinit.sql
again to rebuild the table.init.sql.txt error.sql.txt
2. What did you expect to see? (Required)
The first column is
substring(repeat(c_bek45hvu8g,9),-10000)
SUBSTRING(str,pos) from MySQL documentation:
when
abs(pos) > length(str)
, an empty string will be returned by TiDB.The maximum length of the string is 90, which is less than 10000.
Therefore, the result set should only contain NULL and empty strings.
3. What did you see instead (Required)
However, it seems that in TiDB, when evaluating
substring
, it may be reading beyond the boundaries of the string, resulting in incorrect output. output_re_main2.log4. What is your TiDB version? (Required)
topology:
distributed.yaml:
single.yaml
about us
We are the BASS team from the School of Cyber Science and Technology at Beihang University. Our main focus is on system software security, operating systems, and program analysis research, as well as the development of automated program testing frameworks for detecting software defects. Using our self-developed database vulnerability testing tool, we have identified the above-mentioned vulnerabilities in TiDB that may lead to database logic error.