matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.83k stars 2.64k forks source link

convert-to-utf8mb4 side effect: text column data types will be modfied #19115

Closed MichaelRoosz closed 8 months ago

MichaelRoosz commented 2 years ago

After upgrading to utf8mb4 via ./console core:convert-to-utf8mb4 as documented here: https://matomo.org/faq/how-to-update/how-to-convert-the-database-to-utf8mb4-charset/

All columns of all tables of type "text" (text, mediumtext, etc) have automatically been increased in size by the sql server (MariaDb 10.7).

For example, before:

CREATE TABLE matomo_session (
  id varchar(191) NOT NULL,
  modified int(11) DEFAULT NULL,
  lifetime int(11) DEFAULT NULL,
  data mediumtext DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

after:

CREATE TABLE matomo_session (
  id varchar(191) NOT NULL,
  modified int(11) DEFAULT NULL,
  lifetime int(11) DEFAULT NULL,
  data longtext DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

The logic behind this is documented here: https://mariadb.com/kb/en/setting-character-sets-and-collations/#table-level

For example, an ascii TEXT column requires a single byte per character, so the column can hold up to 65,535 characters. If the column is converted to utf8, 3 bytes can be required for each character, so the column will be converted to MEDIUMTEXT to be able to hold the same number of characters.

As this might be unexpected, I think it would be great to mention this behavior here: https://matomo.org/faq/how-to-update/how-to-convert-the-database-to-utf8mb4-charset/

peterhashair commented 2 years ago

@MichaelRoosz thanks for reporting this, will update the FAQ.

danielmcclure commented 8 months ago

Hi @MichaelRoosz, Thanks again for reporting this. We've gone back and updated the FAQ you suggested with the following information:

Column size adjustment during utf8mb4 conversion One final note is that when upgrading your database to support utf8mb4, be aware that columns storing text data (such as TEXT, MEDIUMTEXT, etc.) will automatically increase in size to ensure the upgraded column can store as many characters as the source column. For example a MEDIUMTEXT column will become a LONGTEXT column. You will need to account for this if you’re manually updating your database schema or if you have specific data storage limits to ensure a smooth upgrade.

Can you please let me know if this adequately addresses your concerns?

MichaelRoosz commented 8 months ago

Hello @danielmcclure looks good to me, thank you 👍