Open collimarco opened 9 months ago
The fix would be straightforward, we just need to change this line:
Even the test looks strange (it suggests a replacement, but it actually removes null bytes):
If this choice is by design (is it good? is it bad?), it should be clarified in the documentation in any case, because this is definitely not what you expect from reading the docs.
Hi @collimarco this is achievable with a custom strategy without too much code:
# config/application.rb
replace_null_byte = lambda do |input, sanitize_null_bytes: true|
input.
force_encoding(Encoding::ASCII_8BIT).
encode!(Encoding::UTF_8,
invalid: :replace,
undef: :replace)
if sanitize_null_bytes && input =~ Rack::UTF8Sanitizer::NULL_BYTE_REGEX
input = input.gsub(Rack::UTF8Sanitizer::NULL_BYTE_REGEX, "")
end
input
end
config.middleware.insert 0, Rack::UTF8Sanitizer, sanitize_null_bytes: true, strategy: replace_null_byte
Thanks for this useful gem. Currently if you use the default
:replace
strategy, invalid characters are replaced with �, but the null byte is replace with nothing.This behavior seems unexpected and inconsistent.
Expected: "Hello \x00 world" => "Hello � world" Actual: "Hello \x00 world" => "Hello world"