Closed testn closed 2 years ago
utf8_general50_ci and ucs2_general50_ci seem to be removed.
while you at it - can you have a look at making utf8mb3 an alias for cesu-8? should fix #1398 #1240 #1333
utf8_general50_ci and ucs2_general50_ci seem to be removed
would it hurt leaving them? not sure if removing might affect user connecting to server version 5.0
I think utf8mb3 should be aliased to utf8 for now. Aliasing it to cesu-8 will hurt the performance significantly
mysql utf8 strings are encoded as cesu-8, "real" utf8 is named utf8mb4 in mysql. utf8mb3 is same as utf8 ( that is, cesu-8 )
mysql utf8 strings are encoded as cesu-8, "real" utf8 is named utf8mb4 in mysql. utf8mb3 is same as utf8 ( that is, cesu-8 )
I think that is the future. Currently, it is still utf8mb3 (https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html)
Also this seems to clarify the differences between cesu8 and utf8mb3 https://www.wikiwand.com/en/Talk:UTF-8
lets tackle utf8 / utf8mb3 / cesu-8 separately. Thanks for the link, interesting read!
I just tried to store 😂 into utf8_unicode_ci and it said
code: 'ER_TRUNCATED_WRONG_VALUE_FOR_FIELD',
errno: 1366,
sqlState: 'HY000',
sqlMessage: "Incorrect string value: '\\xF0\\x9F\\x98\\x82' for column 'field' at row 1",
sql: "INSERT INTO `test-charset-encoding` (field) values('😂')"
I just tried to store 😂 into utf8_unicode_ci and it said
is this good or bad? :) ( e.i is this a positive test for 5117e4f5048beb18f4144bb26a9fffe1cbc2bbb3 ? )
I just tried to store 😂 into utf8_unicode_ci and it said
is this good or bad? :) ( e.i is this a positive test for 5117e4f ? )
I think it's good. It shows that utf8mb3 cannot store non-BMP characters.
I believe I was able to insert non-BMP 💩 using utf8 mysql encoding ( see https://github.com/sidorares/node-mysql2/pull/374#issuecomment-241748134 ), so not sure if that confirms "mysql utf8 = cesu-8", utf8mb3 = same as cecu-8 but only BMP is supported, utf8mb4 = "normal modern utf8"
if you do it that way, you are converting 💩 into 6 invalid bytes (2 UTF-8 characters) in MySQL. MySQL may not be able to understand that character. It may be able to pass it in/out as if it is a two character though.
However, I believe if you try to insert 💩 using .query() method, you won't be able to to that, right?
Can you try to insert 💩 into a column with UTF8_GENERAL_CI?
@sidorares re:columnname, I think we should always use utf8 to decode the column name regardless the characterSet returned in Packet.FieldDefinition. We don't need to use cesu8 to decode it as the column should only contain BMP characters only. So whether you use cesu8 or utf8, they won't be any differences.
can column name use other encodings ( win1251, koi8 for Cyrillic as an example )? If yes not sure if worth adding exception / custom logic for perf benefits
Let me check.
any update on this?
@ahmedbodi I think the discussion got sidetracked a bit into more complex issue, updated collation list LGTM to me. Merging now
I noticed that some collations are missing so I pulled up the latest collation list from MySQL by