pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.05k stars 5.83k forks source link

Character set name is case sensitive in TiDB #8304

Closed evancao77 closed 5 years ago

evancao77 commented 5 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

I use mydumper from version 1. 0 dumper to version 2. 0. 8.for example, character sets are case-sensitive.

  1. What did you expect to see? I can loader to 2.0.8

  2. What did you see instead? dumper sql: create table test (id int) ENGINE=InnoDB DEFAULT CHARSET=UTF8 COLLATE=UTF8_BIN;

Can't recognize UTF8, if I change to utf8, it can be created

Is there a parameter that can be controlled case-sensitive

  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)? 1.0.8 to 2.0.8
evancao77 commented 5 years ago

export json : convert("[{\"airline_code\":\"5569\",\"airline_name\":\"首都航空\",\"airport_from\":\"HFE\",\"airport_to\":\"HET\",\"dt_flight\":\"20108\",\"passenger_idcard\":\"3412241985****\",\"passenger_idcard_type\":\"0\",\"passenger_name\":\"张**\",\"passenger_phone\":\"13*15**\"}]" using UTF8MB4)

I look at the UFT8MB4 of the default character set from the document.but.When using loader imports , It suggests that UTF8MB4 is not supported. I can change it to UTF8,. This is a bug?

tiancaiamao commented 5 years ago

PTAL @winkyao

morgo commented 5 years ago

Showing simplified testcase:

mysql57> create table test
    -> (id int)
    -> ENGINE=InnoDB DEFAULT CHARSET=UTF8 COLLATE=UTF8_BIN;
Query OK, 0 rows affected (0.09 sec)

Then in TiDB:

tidb> create table test
    -> (id int)
    -> ENGINE=InnoDB DEFAULT CHARSET=UTF8 COLLATE=UTF8_BIN;
ERROR 1115 (42000): Unknown character set: 'UTF8'
evancao77 commented 5 years ago

export json : convert("[{"airline_code":"5569","airline_name":"首都航空","airport_from":"HFE","airport_to":"HET","dt_flight":"20108","passenger_idcard":"3412241985","passenger_idcard_type":"0","passenger_name":"张","passenger_phone":"1315*"}]" using UTF8MB4)

I look at the UFT8MB4 of the default character set from the document.but.When using loader imports , It

tiancaiamao commented 5 years ago

When using loader imports , It suggests that UTF8MB4 is not supported.

What't the error message of TiDB and the loader in the log? @evancao77

winkyao commented 5 years ago

@evancao77 The root cause of this is TiDB not change collation to lower case to compare. I will fix it soon.

winkyao commented 5 years ago

@morgo Below case can pass

mysql> create table test(id int) ENGINE=InnoDB DEFAULT CHARSET=UTF8 COLLATE=UTF8_BIN;
ERROR 1115 (42000): Unknown character set: 'UTF8'
mysql> create table test(id int) ENGINE=InnoDB DEFAULT CHARSET=UTF8 COLLATE=utf8_bin;
Query OK, 0 rows affected (0.01 sec)