pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.25k stars 5.84k forks source link

Bit shift for BLOB type behaves incosistently with MySQL8 #53943

Open wengsy150943 opened 5 months ago

wengsy150943 commented 5 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

init db

create table t2(a blob);
create table t3(a blob);
insert into t2 values(0xC2A0);
insert into t3 values(0xC2);
select * from t2,t3 where t2.a like concat("%",t3.a,"%");

Then running:

select * from t2,t3 where (t2.a >> 4) = t3.a;
select * from t2,t3 where (t2.a >> 8) = t3.a;

2. What did you expect to see? (Required)

MySQL8.0.33 shows empty set for each query.

Empty set (0.01 sec)
Empty set (0.01 sec)

3. What did you see instead (Required)

TiDB 8.1 shows one row with warning.

+------+------+
| a    | a    |
+------+------+
|      | �     |
+------+------+
1 row in set, 2 warnings (0.00 sec)
+------+------+
| a    | a    |
+------+------+
|      | �     |
+------+------+
1 row in set, 2 warnings (0.01 sec)
> show warnings;
+---------+------+---------------------------------------+
| Level   | Code | Message                               |
+---------+------+---------------------------------------+
| Warning | 1292 | Truncated incorrect INTEGER value: '' |
| Warning | 1292 | Truncated incorrect DOUBLE value: '�'  |
+---------+------+---------------------------------------+

4. What is your TiDB version? (Required)

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()
                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v8.1.0
Edition: Community
Git Commit Hash: 945d07c5d5c7a1ae212f6013adfb187f2de24b23
Git Branch: HEAD
UTC Build Time: 2024-05-21 03:51:57
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
wengsy150943 commented 5 months ago

We explain the execution plan. It seems that TiDB will cast blob into other types before right shift, which applies incorrect truncation.

explain select * from t2,t3 where (t2.a >> 8) = t3.a;
+------------------------------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------+
| id                           | estRows | task      | access object | operator info                                                                                           |
+------------------------------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------+
| HashJoin_12                  | 1.00    | root      |               | inner join, equal:[eq(Column#5, Column#6)]                                                              |
| ├─Projection_17(Build)       | 1.00    | root      |               | test.t3.a, cast(test.t3.a, double BINARY)->Column#6                                                     |
| │ └─TableReader_19           | 1.00    | root      |               | data:TableFullScan_18                                                                                   |
| │   └─TableFullScan_18       | 1.00    | cop[tikv] | table:t3      | keep order:false, stats:pseudo                                                                          |
| └─Projection_14(Probe)       | 1.00    | root      |               | test.t2.a, cast(rightshift(cast(test.t2.a, bigint(65535) BINARY), 8), double UNSIGNED BINARY)->Column#5 |
|   └─TableReader_16           | 1.00    | root      |               | data:TableFullScan_15                                                                                   |
|     └─TableFullScan_15       | 1.00    | cop[tikv] | table:t2      | keep order:false, stats:pseudo                                                                          |
+------------------------------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------+
zanmato1984 commented 4 months ago

MySQL's right shift supports two types: bigint and binary [1]. So no casting is applied to t2.a >> 8 and it evaluates to a binary as well. Whereas TiDB so far only supports bigint for right shift [2], this is why an implicit cast to bigint is applied to t2.a and this cast evaluates to 0 (with warning saying truncation). Then t2.a >> 8 still evaluates to a bigint 0. Then it is compared with binary (t3.a), causing both sides being casted to double, and the comparison is true for 0 and 0.

[1] https://dev.mysql.com/doc/refman/8.4/en/bit-functions.html#operator_right-shift [2] https://github.com/pingcap/tidb/blob/29fc940ae4ea01482088994e14777c9765f913f7/pkg/expression/builtin_op.go#L420