phpList / phplist3

Fully functional Open Source email marketing manager for creating, sending, integrating, and analysing email campaigns and newsletters.
https://www.phplist.org
GNU Affero General Public License v3.0
734 stars 268 forks source link

Use utf8mb4 for the connection etc #1001

Closed bramley closed 3 months ago

bramley commented 8 months ago

Description

This change sets the character set used for the client-server exchanges to utf8mb4 instead of the current utf8, which is an alias for utf8mb3.

Using utf8mb4 will allow a wider range of characters, supplementary characters, to be sent, specifically emoji characters. If once-off database table changes are made then subject lines can include emoji characters.

This change is backward compatible, it has no effect when supplementary characters are not used.

Related Issue

https://github.com/phpList/phplist3/issues/1000

Screenshots (if appropriate):

michield commented 8 months ago

Does this affect the version of Mysql/Mariadb that is a minimum requirement?

bramley commented 8 months ago

It seems to have been introduced in MySQL 5.5 from 2010 https://dev.mysql.com/blog-archive/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/ and specifically in 5.5.3 from this PDF file of release notes

The utf8mb4 character set has been added. This is similar to utf8, but its encoding allows up to four bytes per character to enable support for supplementary characters

michield commented 6 months ago

It needs to change here

as well/ In fact, "initialise" should use the Sql_Create_Table from mysqli.inc but we can fix that some other time

After that, I was able to put mb4 characters in the subject and content and it worked

michield commented 6 months ago

additionally, we may want to add something to the upgrade script to convert all tables to utf8mb4, although we can also do it as a seperate step like we did before https://github.com/michield/phplist3/blob/master/public_html/lists/admin/converttoutf8.php

bramley commented 6 months ago

These suggestions are going beyond the original scope of the change, which was simply to avoid undoing a manual change made to a phplist file when upgrading. The conversion of an existing database to utf8mb4 would have been made manually.

I don't know how "safe" it would be to try to automate the conversion to utf8mb4 and any reasons why that might fail. It seems safer to leave that to each installation to do so that it is entirely under their control.

Using utf8mb4 for new installations seems sensible though but might need more investigation. This mysql documentation page lists some possible consequences

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-conversion.html

michield commented 5 months ago

Ok, let's leave the scope for this small, and we can tackle conversion and other mb4 related work in a different PR.

bramley commented 5 months ago

I have been looking at this recently because a system I support had a problem with a campaign body including a 4-byte UTF-8 character, so I applied the changes manually.

Looking at the conversion to utf8 page, and the same code in the upgrade page, then trying to use that to upgrade to utf8mb4 showed an issue with the SQL used to change each column

            Sql_Query(sprintf('alter table %s change column %s %s %s default character set utf8',

This syntax requires to repeat the full column definition. Any columns that have been defined as not null or with a default value are losing those on this upgrade. Further, the upgrade is applied on the first upgrade of a new installation, even though that is already utf8. So it looks like some rework is needed.

What is simple and probably quite safe is to upgrade only the columns that are used for a campaign where an admin is likely to want to include a 4-byte UTF-8 character. These are the subject, message and textmessage fields on the messages table, and the data field on the message_data table.

bramley commented 5 months ago

I have added another commit that sets the character set for the message subject and content. These seem to be the fields to which people want to add 4-byte UTF-8 characters such as emojis.

I have tested this change by creating a new database, and upgrading an existing one. In both cases I could copy/paste 😀 into the campaign content, save the campaign, then send a test email successfully.

phpListDockerBot commented 3 months ago

This pull request has been mentioned on phpList Discuss. There might be relevant details there:

https://discuss.phplist.org/t/3-6-15-release-candidate-is-available-for-testing/9473/1

phpListDockerBot commented 3 months ago

This pull request has been mentioned on phpList Discuss. There might be relevant details there:

https://discuss.phplist.org/t/phplist-3-6-15-has-been-released/9495/1