singlebrook / utf8-cleaner

MIT License
277 stars 44 forks source link

ampersands and semicolons aren't reencoded #4

Closed apeckham closed 11 years ago

apeckham commented 11 years ago

hi,

if the request contains ampersands and semicolons, they're not reencoded correctly. attached failing test shows the bug. I think the solution is to use CGI.escape/unescape instead of URI.encode/decode, but I'm not sure if that's correct for all headers -- HTTP_REFERER, for example, might need URI.encode/decode.

in our app with the middleware present, this bug exhibited itself by truncating GET parameters following an escaped "&" or ";" character.

thanks!

sbleon commented 11 years ago

Thanks for the failing test, @apeckham ! In working to implement a fix, I realized that we're inadvertently decoding some other characters as well (as seen in the test it "turns valid %-escaped ASCII chars into their ASCII equivalents"). I now feel like the goal should be to not change any characters at all, except for removing the invalid ones. This will require a different approach, but we'll try to get to it soon.

sbleon commented 11 years ago

@apeckham the just-release version 0.0.5 includes a fix for this issue. It's actually almost a full-rewrite, so please give it a test in your app!

sbleon commented 11 years ago

Damn, I found a problem in 0.0.5 when running with Ruby 1.9.3. I'm working on a fix. I'll let you know.

sbleon commented 11 years ago

0.0.6 looks good and is out on Rubygems.