hedgedoc / container

HedgeDoc container image resources
https://docs.hedgedoc.org/setup/docker/
196 stars 52 forks source link

Use utf8mb4 character set in MariaDB config #287

Closed davidmehren closed 2 years ago

davidmehren commented 2 years ago

Since MariaDB 10.6 the 'utf8' character set is an alias for 'utf8mb3'. This seems to trip up something, and HedgeDoc can't connect to the database anymore.

This PR changes the character set to 'utf8mb4', which is the "real" UTF-8 charset for MariaDB. It should be backwards compatible with older MariaDB versions, the charset exists for a long time.

This will probably break existing databases, as the data in the tables needs to be manually converted (see https://mathiasbynens.be/notes/mysql-utf8mb4).

References: https://mariadb.com/kb/en/upgrading-from-mariadb-105-to-mariadb-106/#character-sets https://mariadb.com/kb/en/mariadb-1061-release-notes/#character-sets https://mathiasbynens.be/notes/mysql-utf8mb4

davidmehren commented 2 years ago

After we merge this, everyone who still (automatically or manually) "updates" their installation by pulling directly from this repo will magically get a new charset if they use MariaDB.

I frankly have no idea how to avoid this, besides not merging the PR. Instructing people to update in this way was a bad idea because of exactly this problem. We stopped documenting that some time ago, but I don't know how many people still update by pulling. We also have no changelog for this repo, which we could use to inform users about the needed manual intervention.

agross commented 2 years ago

I was able to migrate my 3 instances using this script flawlessly:

https://github.com/fleio/utf8mb4-convert

The process went as follows:

  1. Stop the hedegdoc container
  2. With the mariadb:10.5 container still running, docker exec -it hedgedoc_db_1 sh
  3. apt update && apt install -y curl
  4. Install the script from the link above
  5. Run the script for the hedgedoc database, e.g. ./convert.sh hedgedoc -uhedgedoc -psecret
  6. Exit the container
  7. Update docker-compose.yaml to use mariadb:10.6 or above
  8. Update the utf8.cnf file that is mounted to the database container with the new settings from this PR
  9. Restart docker-compose for the project
davidmehren commented 2 years ago

Thank you, @agross, for the description. The issue is not, that it is not possible to convert the database charset, but that we have no good way of communicating that it is needed. Users unaware of the change may update their repo and then get encoding errors (I'm not sure if that means "text renders incorrectly", "downtime" or "database lights on fire").

I think the best option would be to merge this when we release the next 1.x version and yell loudly in the changelog.

agross commented 2 years ago

Yes, I understood your concerns. I had a similar "downtime experience" due to the update of the MariaDB container to latest because hedgedoc was no longer able to connect with the old utf8.cnf settings.