pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.8k stars 5.8k forks source link

Compatible with mysql support change default collation for utf8mb4 #45455

Open King-Dylan opened 1 year ago

King-Dylan commented 1 year ago

Feature Request

Is your feature request related to a problem? Please describe:

Mysql 8.0 support change default collation for utf8mb4.If we create table with DEFAULT CHARSET=utf8mb4,but did not specify collation, so the value of collation is inherited from default_collation_for_utf8mb4.But tidb doesn't support it yet.

Describe the feature you'd like:

TiDB should support default_collation_for_utf8mb4 variable let user can change the default collation for utf8mb4.

mysql> SELECT @@GLOBAL.collation_database,@@SESSION.collation_database;
+-----------------------------+------------------------------+
| @@GLOBAL.collation_database | @@SESSION.collation_database |
+-----------------------------+------------------------------+
| utf8mb4_general_ci          | utf8mb4_general_ci           |
+-----------------------------+------------------------------+
1 row in set (0.00 sec)

mysql> create database sbtest1 /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci */;
Query OK, 1 row affected (0.09 sec)

mysql> use sbtest1;
Database changed
mysql> create table t3 (id int)  DEFAULT CHARSET=utf8mb4 ;
Query OK, 0 rows affected (0.02 sec)

mysql> show create table t3;
+-------+--------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                       |
+-------+--------------------------------------------------------------------------------------------------------------------+
| t3    | CREATE TABLE `t3` (
  `id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+--------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> set session default_collation_for_utf8mb4='utf8mb4_general_ci';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show variables like '%collation%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8_general_ci    |
| collation_database            | utf8mb4_0900_ai_ci |
| collation_server              | utf8mb4_general_ci |
| default_collation_for_utf8mb4 | utf8mb4_general_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)

mysql> create table t4 (id int)  DEFAULT CHARSET=utf8mb4 ;
Query OK, 0 rows affected (0.03 sec)

mysql> show create table t4;
+-------+--------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                       |
+-------+--------------------------------------------------------------------------------------------------------------------+
| t4    | CREATE TABLE `t4` (
  `id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
+-------+--------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

King-Dylan commented 1 year ago

Mysql will discard this variable in the future, so this is not a good feature request,I think we should have the table collation inherit from the default collation of the database.

mysql> set session default_collation_for_utf8mb4='utf8mb4_general_ci';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                |
+---------+------+--------------------------------------------------------------------------------------------------------+
| Warning | 1681 | Updating 'default_collation_for_utf8mb4' is deprecated. It will be made read-only in a future release. |
+---------+------+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
YangKeao commented 1 year ago

I think we should have the table collation inherit from the default collation of the database.

Isn't it just the current behavior?

create database test1 COLLATE utf8mb4_general_ci;
use test1;
create table t2 (id int);
show create table t2;

You'll see the collation of the table is utf8mb4_general_ci

dveeden commented 1 year ago

Looks like this already works:

sql> CREATE SCHEMA s1 CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.0612 sec)

sql> CREATE SCHEMA s2 CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Query OK, 0 rows affected (0.0322 sec)

sql> CREATE SCHEMA s3 COLLATE utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.0338 sec)

sql> CREATE SCHEMA s4 CHARACTER SET ascii;
Query OK, 0 rows affected (0.0307 sec)

sql> CREATE TABLE s1.t1 (id INT PRIMARY KEY);
Query OK, 0 rows affected (0.0209 sec)

sql> CREATE TABLE s2.t1 (id INT PRIMARY KEY);
Query OK, 0 rows affected (0.0345 sec)

sql> CREATE TABLE s3.t1 (id INT PRIMARY KEY);
Query OK, 0 rows affected (0.0374 sec)

sql> CREATE TABLE s4.t1 (id INT PRIMARY KEY);
Query OK, 0 rows affected (0.0202 sec)

sql> SHOW CREATE TABLE s1.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.0009 sec)

sql> SHOW CREATE TABLE s2.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.0009 sec)

sql> SHOW CREATE TABLE s3.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.0010 sec)

sql> SHOW CREATE TABLE s4.t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin
1 row in set (0.0009 sec)

sql> SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v7.3.0-alpha-309-g8b8ba8b648
Edition: Community
Git Commit Hash: 8b8ba8b648f24fabbd4c0e2fcf9d6e5bdde97a4b
Git Branch: navicat
UTC Build Time: 2023-08-02 06:44:38
GoVersion: go1.20.6
Race Enabled: false
Check Table Before Drop: false
Store: unistore
1 row in set (0.0003 sec)

So:

Once #37566 is done we could consider changing the default collation of TiDB to match MySQL or make it configurable by supporting default_collation_for_utf8mb4.

This is related to #29350