unee-t / frontend

Meteor front end
https://case.dev.unee-t.com/
GNU Affero General Public License v3.0
9 stars 17 forks source link

Chinese characters are not displayed in the email notifications #845

Closed franck-boullier closed 4 years ago

franck-boullier commented 5 years ago

If the user types a chinese character in a Unee-T case, the email notification is not able to display this correctly

image

kaihendry commented 5 years ago

Any chance if the email could be forwarded to me as an attachment please?

kaihendry commented 5 years ago

Btw, this is in the demo environment AFAICT.

The case title is literally ????? in RequestIDs "17201dcb-6172-4946-a37b-07dfd17c5c3f" & "8e2a2047-7045-4b4c-887d-2d7f5a3cdaf5". I.e. the payload generated has the Unicode mangled.

aws --profile uneet-demo logs filter-log-events --log-group-name "/aws/lambda/alambda_simple" --start-time 1561967459000 --filter-pattern '7206'
kaihendry commented 5 years ago

I've just raised an AWS support ticket about this https://console.aws.amazon.com/support/cases?region=ap-southeast-1#/6232014911/en

With the video https://s.natalian.org/2019-07-08/mangles.mp4

kaihendry commented 5 years ago

Sorry, there was a reply on Jul 19th:


To provide some base reference to this issue, as you might already be aware MySQL’s utf8 (the default) only implements UTF-8 encoding partially  (65,536 code points in the range  from U+0000 to U+FFFF called BMP - Basic Multilingual Plane). It support BMP characters only as it can only store a maximum of three bytes per multi-byte character. UTF-8-encoded symbols that take up four bytes are not supported. 
Thus, when an attempt to insert strings of the form you attempted (which contain 4 bytes per character) is made, an 'incorrect string value' warning is thrown (in the case of MySQL 5.6 compatible instances) and the string value gets 'mangled'.

An example of the below in my test environment using just a plain table :
mysql> insert into t1 values(2, "但你的愛是");
Query OK, 1 row affected, 1 warning (0.21 sec)

mysql> show warnings;
+---------+------+----------------------------------------------------------------------------------+
| Level   | Code | Message                                                                          |
+---------+------+----------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xE4\xBD\x86\xE4\xBD\xA0...' for column 'name' at row 1 |
+---------+------+----------------------------------------------------------------------------------+
1 row in set (0.25 sec)

mysql> select * from t1;
+------+-------+
| id   | name  |
+------+-------+
|    1 | dvfdv |
|    2 | ????? |
+------+-------+
2 rows in set (0.23 sec)

mysql> ALTER TABLE t1
    -> DEFAULT CHARACTER SET utf8mb4,
    -> MODIFY name VARCHAR(100)
    -> CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Query OK, 2 rows affected (0.30 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into t1 values(2, "但你的愛是");
Query OK, 1 row affected (0.29 sec)

mysql> select * from t1;                                                     
| id   | name            |
+------+-----------------+
|    1 | dvfdv           |
|    2 | ?????           |
|    2 | 但你的愛是      |
+------+-----------------+
3 rows in set (0.22 sec)

Hence, as you can see above we can get past this issue by ensuring we use utf8mb4 instead. You would need to change the 'name' column in this case to use the utf8mb4 character set and collation: Below is the an example query to convert character set and collation:
mysql> ALTER TABLE table_name DEFAULT CHARACTER SET utf8mb4, MODIFY column_name VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

I think it's related to https://github.com/unee-t/frontend/issues/850#issuecomment-514068692 and is pending on https://github.com/bugzilla/bugzilla/pull/79

kaihendry commented 4 years ago

This is working now, tested in demo by UIlicious https://snippet.uilicious.com/embed/test/private/33R2qBvg7HWLS8bKjWxJKM?step=1&autoplay=1