Closed jogaco closed 6 years ago
I will accept a patch that implements this.
Is this what you want, behavior-wise?
input = "hello \xF0\xA9\xB6\x98 world"
# => "hello 𩶘 world"
input.each_char.reject{|char| char.bytesize == 4}.join
# => "hello world"
Yes. Would be nice to have such chars optionally replaced by ? or similar.
@whitequark would you think allowing a custom scrubber to be a better feature to added, as it seems to me optionally scrubbing valid utf8 shouldn't be a core feature of the 'utf8 sanitizer'?
@bf4 Good idea.
I've looked into it before but was concerned about order of operations and possible performance cost, and how the interdace should look
I don't know and I don't have time to design the interface, but I'll review it if someone implements a PoC.
Funny, I just had this error myself:
Incorrect string value: '\xF0\x9F\x98\x8A ' for column
Since we had
utf8mb4_message = "Just because. Thank you 😊 "
I chose to resolve this by changing the adapter to use the utf8mb4
encoding and altered the table: ALTER TABLE thing_with_message MODIFY COLUMN message VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
I wonder if it might make sense to add a 'max byte size' configuration and have it default to not set, but in this case, I would set it to '3', or '4' depending on what I want to allow.]
Never use utf8 in MySQL — always use utf8mb4 instead. Updating your databases and code might take some time, but it’s definitely worth the effort. Why would you arbitrarily limit the set of symbols that can be used in your database? Why would you lose data every time a user enters an astral symbol as part of a comment or message or whatever it is you store in your database? There’s no reason not to strive for full Unicode support everywhere. Do the right thing, and use utf8mb4. 🍻
Not all apps need support for emojis, so this option would certainly be helpful.
This is now possible to do in a custom strategy in your application since #41.
For a non utf8-mb4 mysql database storage backend.