whitequark / rack-utf8_sanitizer

Rack::UTF8Sanitizer is a Rack middleware which cleans up invalid UTF8 characters in request URI and headers.
MIT License
314 stars 53 forks source link

Use Regexp#match? over String#=~ when testing for null bytes #85

Closed geoffharcourt closed 8 months ago

geoffharcourt commented 8 months ago

https://github.com/fastruby/fast-ruby#regexp-vs-regexpmatch-vs-regexpmatch-vs-stringmatch-vs-string-vs-stringmatch-code-

This change updates the null byte checking in the included exception strategy to scan for null bytes with Regexp#match?. It appears this will be 2.5x faster when parsing the data, which might be helpful given the overhead of running this on every request.