pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.27k stars 5.84k forks source link

SOME operator with collation is resulting error. #32649

Open ramanich1 opened 2 years ago

ramanich1 commented 2 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

drop table if exists t1;
CREATE TABLE t1 (a CHAR(1) CHARSET latin1, b CHAR(1) CHARSET utf8);
SELECT a FROM t1 WHERE a < SOME(SELECT '1');

2. What did you expect to see? (Required)

mysql> SELECT a FROM t1 WHERE a < SOME(SELECT '1');
Empty set (0.00 sec)

3. What did you see instead (Required)

mysql> SELECT a FROM t1 WHERE a < SOME(SELECT '1');
ERROR 1267 (HY000): Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '<'

4. What is your TiDB version? (Required)

| Release Version: v5.5.0-alpha-182-gad9430039
Edition: Community
Git Commit Hash: ad9430039f54bb9af78d44831c176bc5eafcbba0
Git Branch: master
UTC Build Time: 2022-02-18 17:57:13
GoVersion: go1.17.2
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |
aytrack commented 2 years ago

this sql will work well.

MySQL root@127.0.0.1:test> SELECT a FROM t1 WHERE a < SOME(SELECT _latin1'1');
+---+
| a |
+---+
tangenta commented 2 years ago

For the sake of the backward compatibility, TiDB treats latin1 as utf8mb4. So I test it with the charset ascii:

MySQL 5.7.36:

CREATE TABLE `t1` (
  `a` char(1) CHARACTER SET ascii DEFAULT NULL,
  `b` char(1) CHARACTER SET utf8 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

explain SELECT a FROM t1 WHERE a < SOME(select '1');
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
|  1 | PRIMARY     | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where    |
|  2 | SUBQUERY    | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+

explain SELECT a FROM t1 WHERE a < SOME(select '你好');
ERROR 1267 (HY000): Illegal mix of collations (ascii_general_ci,IMPLICIT) and (latin1_swedish_ci,COERCIBLE) for operation '<'

Since it show the differences for EXPLAIN statement, we can infer that the planner(instead of the executor) takes the responsibility of the collation check:

'你好' contains the non-ascii characters, it is not safe to convert '你好' using ascii encoding. Thus, the error "Illegal mix of collations" is reported.

TiDB 1b2492f does support such check:

SELECT a FROM t1 WHERE a < '1';
Empty set (0.00 sec)

SELECT a FROM t1 WHERE a < '你好';
ERROR 1267 (HY000): Illegal mix of collations (ascii_bin,IMPLICIT) and (utf8mb4_bin,COERCIBLE) for operation '<'

But it is only limited to the constants. If we change the constant to a SOME expression:

mysql> SELECT a FROM t1 WHERE a < SOME(SELECT '1');
ERROR 1267 (HY000): Illegal mix of collations (ascii_bin,IMPLICIT) and (utf8mb4_bin,COERCIBLE) for operation '<'

mysql> SELECT a FROM t1 WHERE a < SOME(SELECT '你好');
ERROR 1267 (HY000): Illegal mix of collations (ascii_bin,IMPLICIT) and (utf8mb4_bin,COERCIBLE) for operation '<'

If TiDB supports rewriting the expressions like some(select '1') to '1', the problem should be solved. However, it is not easy to replace *ast.CompareSubqueryExpr with *ast.BinaryOperationExpr using expression rewriter..

We need to find a better way to fix this issue.