singlebrook / utf8-cleaner

MIT License
277 stars 44 forks source link

Remove invalid UTF-8 characters #20

Closed pkang closed 9 years ago

pkang commented 9 years ago

There are cases where environment values coming in may contain non-CGI escaped invalid UTF-8 characters. We have experienced this with requests coming in from Windows Internet Explorer 11 with query strings that contain smart quotes ("\x93\x94") with our Rails server. This would invariably cause stack traces further down the Rack stack because we had invalid strings.

The fix here removes invalid UTF-8 characters before passing it to cleaned_uri_string.

pkang commented 9 years ago

Hm, looks like this fails in ruby 2.0 and 1.9.3.

pkang commented 9 years ago

We'll close for now and come back when we have a more comprehensive fix.