Closed mneudert closed 1 month ago
@mneudert I'm wondering if we also should introduce a way to ensure that all tables are created with the same collation (like we do for TiDb). At the moment a tables collation is also based on the default collation of the database, not sure if that could cause problems as well at some point, if the default collation changes 🤔
Good point @sgiehl.
I just checked how MariaDB behaves, and found that the current proposed fix is not enough :(
As we create table using DEFAULT CHARSET
(without a COLLATE
value), until now I would have expected the database default to be used. And that is a value that should not be changed by any database upgrade. But, as I found out, the creation in our case is still using the "unexpected" utf8mb4_uca1400_ai_ci
collation for any new table.
So we indeed need to allow setting the collation in all create statements.
Should the config setting be renamed to the more generic collation
, and be added to getTableCreateOptions
?
Should the config setting be renamed to the more generic collation, and be added to getTableCreateOptions?
Guess that makes most sense. And during installation we should detect the current default collation in order to set it to the config. Not sure if we should also do that during the update, to ensure every installation has a collation set in the end.
The changes related to tables creation look good. Are there still any changes pending related to the installation and automatic config setup?
Would appreciate a patch release for this soon! My archive tasks are consistently failing for now.
Description:
The MariaDB
11.5
release introduced a change in the default Unicode collation.Due to the way Matomo connects to the database (i.e. sending
SET NAMES
without aCOLLATE
parameter), this can lead to a change in the collation used in some queries.If the database was created with a previous MariaDB version, like
11.3
, and thecharset
setting was configured asutf8mb4
, the effective collation afterSET NAMES
should beutf8mb4_general_ci
. This collation will then also be used to create the archive tables likearchive_blob_2024_08
.With MariaDB
11.5
, the collation will, and this may depend on the individual server configuration, change toutf8mb4_uca1400_ai_ci
. And this collection will then be used to create a new archive table.The problems arise when queries are using variables, for example during archiving:
If the table
archive_blob_2024_08
was created usingutf8mb4_general_ci
, and the connection collation is set toutf8mb4_uca1400_ai_ci
, the variable assignments in theWHERE
clause will create a forbidden collation mix. And this breaks archiving.While one way to work around that issue is to reconfigure the
character_set_collations
server configuration fromutf8mb4=utf8mb4_uca1400_ai_ci
toutf8mb4=utf8mb4_general_ci
, this may not be possible in many environments, like shared hosting.This PR introduces a new, optional, database setting
connection_collation
. If it is set alongside a databasecharset
, this value will be passed to aSET NAMES ... COLLATE ...
statement, setting the connection collation back to the value required for an uninterrupted service.Config update
During installation and the next update, the config will be checked if a collation has been set.
If that is not the case, an automatic update is tried, based on the comparison of the collation of the
user
table and the one returned fromSELECT @@collation_connection
. If both are the same the configuration will be updated to this value.If not, the update will check the most recent archive table (by name). If the collation of that table matches the users table, it will be chosen for the config update.
Otherwise no update will take place, so we don't accidentally break any setups. For this case the diagnostics have been updated to show the used connection collation and suggest updating the configuration manually to a suitable value.
Note: It can happen that, after an upgrade, a mix of database tables has been created. For example
2024_08
and2024_09
could have been created with different collations. In this case the one of the tables has to be manually altered (or deleted and recreated by invalidation and rearchiving), so all tables have the same collation.Fixes #22536 Refs DEV-18459
Review