phpbb / customisation-db

A Modification and Style database for phpBB.com and the International Support Teams, codename Titania
https://phpbb.com
57 stars 54 forks source link

SQL error: Incorrect string value: '\xF0\x9F\x98\x84 \xF0...' for column 'attention_description' at row 1 #371

Open battye opened 1 year ago

battye commented 1 year ago

Getting the following error when trying to update a contribution via "Manage Contribution" (https://www.phpbb.com/customise/db/style/twemoji/manage) :

SQL ERROR Incorrect string value: '\xF0\x9F\x98\x84 \xF0...' for column 'attention_description' at row 1 [1366] An SQL error occurred while fetching this page. Please contact the Board Administrator if this problem persists.

Wondering if it's an encoding error due to either an arrow → in the title or copyright symbol © and emojis in the body text; however, the contribution description correctly saved the first time, and scrubbing all non-ASCII characters still doesn't prevent the error on edit. There are quite a lot of results for Incorrect string value in the other project trackers, but none so far in CUSTDB.

Reporter: luo-ning Created: 30/Mar/22 11:36 PM

Comments:

luo-ning added a comment - 03/Apr/22 9:45 PM - edited

@DavidIQ What exactly are you trying to add? The description already has quite a few emojis in it. I want to update the description based on updates to the package (added the last 2 remaining emoticon equivalents). The updated description doesn't even contain any emojis (removed them all and replaced with an image), which is the weird thing.

luo-ning added a comment - 04/Apr/22 5:16 PM - edited BTW, here's an example I just saw of why solving this at the database encoding level is much more robust than solving it at the PHP code level: I just received an email from the phpBB forum where I'm admin titled "Reply in “🖌️ 📐Markdown and other formatting - Cheat Sheet”", which should be "Reply in “🖌️ 📐Markdown and other formatting - Cheat Sheet”. Email subject lines have their own means of encoding non-ASCII text, which is completely different from HTML entities, so HTML entities should never be appearing in that context. Edit: Upon originally posting this reply (without manually double-encoding the emojis/entities), I got a Jira error... is even the Jira back-end using the same broken encoding? ⭕

3Di [X] (Inactive) added a comment - 05/Apr/22 4:44 AM This is a problem just for MySql and MariaDB, it doesn't affects any other DB layer. The problem lies where the text formatter is not in use (s9e) only.

luo-ning added a comment - 05/Apr/22 12:35 PM Found this Mathias Bynens post from 2012 regarding the MySQL/MariaDB bug and how to migrate to utf8mb4: https://mathiasbynens.be/notes/mysql-utf8mb4 It mentions that utf8mb3/fake utf8 can even lead to security vulnerabilities, though I don't think that's probably the case for this Titania bug, given that it seems like it simply chokes rather than storing truncated data. He also mentions (2nd comment) why HTML entity encoding is generally not a satisfactory solution to this problem Regarding the email subject line bug, I think I have a simple fix for that, along with any related bugs that may crop up in future — I'll create another ticket and have a PR up soon.

luo-ning added a comment - 13/Apr/22 5:12 PM Encountered this same problem in yet another place — subject line of private messages. Seems to be a problem with the mysqli adapter. But I'll create a new ticket for that.