jacobwb / hashover-next

This branch will be HashOver 2.0
GNU Affero General Public License v3.0
420 stars 87 forks source link

Add utf8mb4 charset hint to database documentation #315

Open da2x opened 2 years ago

da2x commented 2 years ago

utf8 is an alias for utf8mb3 in MySQL and MariaDB. Some emojis use 4-bytes, so recommend utf8mb4.

jacobwb commented 2 years ago

Is there any reason not to also use utf8mb4 as the default in secrets.php? I would like to support all emoji by default, unless there's a good reason not to.

da2x commented 2 years ago

SQLite, PostgreSQL, and others handles 2–4 bytes from utf8 as per the Unicode standard. MySQL wanted to save RAM back in the day and normalized on utf8 meaning 3-bytes instead; which is why you need to specify utf8mb4 to get full Unicode support. MariaDB inherited this legacy from MySQL. The other database defaults in the secrets file is for SQLite.

So … yeah. Do you want to default to MySQL-legacy-workaround or the guys who’ve followed the Unicode standard without introducing issues for their users? The ambiguity is why I put it in the documentation. It’s a common issue and you might end up with breaking multibyte emojis. But that’s kind of what you get when choosing MySQL/MariaDB.