Closed helloluis closed 10 years ago
The utf8-cleaner middleware operates only on incoming strings, and currently only handles URI-encoded strings. The gem could be enhanced to clean up non-URI-encoded strings, and the utility classes could also be used outside of the middleware (e.g. for displayed questionable data). However, in its current state it doesn't address your particular issue.
On Fri, Oct 18, 2013 at 6:23 AM, Luis Buenaventura <notifications@github.com
wrote:
We've got a lot of user input that occasionally happens to be in the wrong encoding format (I'm not sure how it happens, and I'm not 100% sure why it doesn't get forced into UTF-8 by MongoDB). Recently I've taken to using a forced encode like below on strings that need to be displayed, but as you can imagine this is an untenable solution for anything except the most limited scenarios.
str.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
Is this the kind of situation that UTF8-cleaner was built to fix? Or does it only work for incoming strings?
— Reply to this email directly or view it on GitHubhttps://github.com/singlebrook/utf8-cleaner/issues/5 .
We've got a lot of user input that occasionally happens to be in the wrong encoding format (I'm not sure how it happens, and I'm not 100% sure why it doesn't get forced into UTF-8 by MongoDB). Recently I've taken to using a forced encode like below on strings that need to be displayed, but as you can imagine this is an untenable solution for anything except the most limited scenarios.
Is this the kind of situation that UTF8-cleaner was built to fix? Or does it only work for incoming strings?