m4i / omniauth-runkeeper

OmniAuth strategy for RunKeeper
https://rubygems.org/gems/omniauth-runkeeper
17 stars 14 forks source link

Encoding seems to be either incorrect or malformed #2

Open JamesChevalier opened 10 years ago

JamesChevalier commented 10 years ago

I've been using omniauth-runkeeper for over a year now, and it's been great - thank you for creating it.

Just recently, I noticed some errors when a particular user was attempting to log in. It turned out that they had added a special character (í) to their name & Rails was crashing with this error when attempting to add that information to the database:

Mysql2::Error - Incorrect string value: '\xEDm Che...' for column 'name' at row 1

Some really quick background info...

So, Rails was choking on adding \xED to the database. That string should have been UTF8 at that point, though.

If I run .encoding on that name field, it reports back that it is #<Encoding:UTF-8>. I'm not sure if it should not report back that encoding, or if something should not be performed on the string(s) in order to keep them as UTF8.

I ran through a similar test of retrieving the user's name on RunKeeper through the healthgraph gem, and it did not exhibit this issue. The strings returned were UTF8 encoded and they contained the actual UTF8 í character.

While I'm a little over my head, I did try to poke around within the runkeeper.rb file. I removed MultiJson.decode from line 34, to see if that was the culprit - this was a breaking change, but it allowed me to dive in through debugger access where I was able to confirm that the string is already hex encoded at that point. I'm still over my head, but this seems like it indicates that the string is made this way outside of the scope of this gem - either by the API itself or somewhere else.

Is what I've found correct - is this issue outside of your control? If it isn't, is it possible for you to fix the data in raw_info to be completely UTF8?

JamesChevalier commented 10 years ago

This is looking more & more like an issue with mysql. I've since run into further issues with emoji characters and the mysql utf8 character set.

I learned that emoji support requires use of the utf8mb4 character set, but that comes with a huge set of its own problems/issues. Three good references are How to support full Unicode in MySQL databases, active_record, MySQL, and emoji, and this issue.

It looks like the two options are:

  1. Use demoji to strip emoji characters, and continue using the utf8 character set.
  2. Change the character set used on the database, the character set used on each table, update all your text fields (I don't know if you need to update other text fields like MEDIUMTEXT, etc), and change all of the string columns that you index on from VARCHAR(255) to VARCHAR(191).

So, it's looking like an enormously huge mess that has little to nothing to do with omniauth-runkeeper.