singlebrook / utf8-cleaner

MIT License
277 stars 44 forks source link

ActionView::Template::Error: incompatible character encodings: UTF-8 and ASCII-8BIT #31

Open vemv opened 8 years ago

vemv commented 8 years ago

Hi there!

I've been using utf8-cleaner for quite a while. To be honest I don't quite know if it has had any effect in the application - I put it 'just in case' given that it seems a well maintained gem, and I was experiencing requests with problematic encodings.

Theoretically if I use utf8-cleaner, no request URL encoding should ever cause a 500, right?

Well, I am able to consistently reproduce this in my app:

curl -I `ruby -e "puts %|https://www.myapp.com/foo/bar\?abcdt\=\x80\xC2\\@7ok_id\=130|"`
HTTP/1.1 500 Internal Server Error

(anonimized domain/route/params)

Internaly the error is:

ActionView::Template::Error: incompatible character encodings: UTF-8 and ASCII-8BIT

Unfortunately I cannot reproduce this on my machine; I am able to consistently reproduce it in production though.

Setup 1 (localhost, not reproducible)

Plain Rails server:

curl -I `ruby -e "puts %|http://localhost:3000/foo/bar\?abcdt\=\x80\xC2\\@7ok_id\=130|"`
HTTP/1.1 200 OK

Setup 2 (localhost, not reproducible)

Rails server behind local instance of nginx.

curl -I `ruby -e "puts %|http://localhost:8080/foo/bar\?abcdt\=\x80\xC2\\@7ok_id\=130|"`
HTTP/1.1 200 OK

Setup 3 (production, reproducible)

Cloudflare -> AWS ELB -> nginx -> Rails server

curl -I `ruby -e "puts %|https://www.myapp.com/foo/bar\?abcdt\=\x80\xC2\\@7ok_id\=130|"`
HTTP/1.1 500 Internal Server Error

My point is that maybe Cloudflare/ELB are doing something funny.

Let me know if I can do anything to help debugging the issue.

Cheers - Victor

vemv commented 8 years ago

Using version: 0.2.5.

sbleon commented 8 years ago

Thanks, Victor, for the positive feedback!

utf8-cleaner’s purpose is to remove invalid UTF-8 characters from the environment. I don’t think this error is due to invalid UTF-8, per se. It’s caused by having characters from multiple character sets in the same string.

If you’re using Rails, I’d suggest adding a rescue_from in your ApplicationController that rescues this particular exception (which might involve inspecting the message as well as the class) and returns a 400 error instead of a 500. This is exceptionally bad input, and it’s the client’s responsibility to fix it, not the servers.*

On Sat, Aug 20, 2016 at 2:19 PM, vemv notifications@github.com wrote:

Using version: 0.2.5.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/singlebrook/utf8-cleaner/issues/31#issuecomment-241215392, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMINzUACOkibpIy2k7cWBAvSYlwOKN4ks5qh0VLgaJpZM4JpIa5 .