whitequark / rack-utf8_sanitizer

Rack::UTF8Sanitizer is a Rack middleware which cleans up invalid UTF8 characters in request URI and headers.
MIT License
314 stars 53 forks source link

Feature request: capability of stripping-out 4-byte utf8 characters #24

Closed jogaco closed 6 years ago

jogaco commented 9 years ago

For a non utf8-mb4 mysql database storage backend.

whitequark commented 9 years ago

I will accept a patch that implements this.

bf4 commented 9 years ago

Is this what you want, behavior-wise?

input = "hello \xF0\xA9\xB6\x98 world"
# => "hello 𩶘 world"
input.each_char.reject{|char| char.bytesize == 4}.join
#  => "hello  world"
jogaco commented 9 years ago

Yes. Would be nice to have such chars optionally replaced by ? or similar.

bf4 commented 9 years ago

@whitequark would you think allowing a custom scrubber to be a better feature to added, as it seems to me optionally scrubbing valid utf8 shouldn't be a core feature of the 'utf8 sanitizer'?

whitequark commented 9 years ago

@bf4 Good idea.

bf4 commented 9 years ago

I've looked into it before but was concerned about order of operations and possible performance cost, and how the interdace should look

whitequark commented 9 years ago

I don't know and I don't have time to design the interface, but I'll review it if someone implements a PoC.

bf4 commented 7 years ago

Funny, I just had this error myself:

Incorrect string value: '\xF0\x9F\x98\x8A ' for column

Since we had

  utf8mb4_message = "Just because. Thank you 😊 "

I chose to resolve this by changing the adapter to use the utf8mb4 encoding and altered the table: ALTER TABLE thing_with_message MODIFY COLUMN message VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL

I wonder if it might make sense to add a 'max byte size' configuration and have it default to not set, but in this case, I would set it to '3', or '4' depending on what I want to allow.]

jogaco commented 7 years ago

Not all apps need support for emojis, so this option would certainly be helpful.

whitequark commented 6 years ago

This is now possible to do in a custom strategy in your application since #41.