singlebrook / utf8-cleaner

MIT License
277 stars 44 forks source link

ArgumentError: string contains null byte #35

Closed davidrouten closed 6 years ago

davidrouten commented 6 years ago

Hi! We've been using utf8-cleaner for a bit and it's made a big difference in preventing our bug tracking services from being flooded, so thank you for sharing.

Unfortunately as soon as our older utf8 errors stopped rolling in we started getting a lot of these "string contains null byte" errors and utf8-cleaner isn't treating these as invalid strings. Our app is running Rails 5.2, Ruby 2.5.1, and utf8-cleaner 0.2.5.

I created a branch to add a check for this null character %00 to utf8-cleaner and would love to submit a Pull Request if you all would be interested (PR available here). It is rather basic and just adds another regex check for NULL_CHARS = /(%00)/ right after valid_uri_encoded_utf8 checks for INVALID_PERCENT_ENCODING_REGEX.

Before changes:

curl -I https://localhost:5000/customers/somecustomer%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%afWindows%c0%afsystem%c0%aeini%00
HTTP/1.1 500 Internal Server Error
Content-Type: text/html; charset=UTF-8
~> 500 ArgumentError (string contains null byte):

After changes:

curl -I https://localhost:5000/customers/somecustomer%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%af%c0%ae%c0%ae%c0%afWindows%c0%afsystem%c0%aeini%00
HTTP/1.1 301 Moved Permanently
X-Frame-Options: SAMEORIGIN
~> 301 redirect

Reading the previous, still-open issue, I'd considered using a rescue_from as Leon suggested, but to his other point, I believe a fix for any null characters would be right in line with the main purpose of the gem; we're using utf8-cleaner to clean our incoming requests so we can at least handle/route them properly, even if they aren't properly formed or correct. That being said, I'm of course open to any feedback, suggestions, or constructive criticism.