ktaragorn / mobile_detect

Ruby port of php library of the same name
MIT License
7 stars 2 forks source link

Using UTF-8 regex on real header strings raises errors #2

Open NealJMD opened 8 years ago

NealJMD commented 8 years ago

We're using this gem in production and have started to see an error cropping up 'Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)' with a stack trace leading into this gem. I can re-create the error in the console like this -

irb(main):001:0> require 'mobile_detect'
=> true
irb(main):002:0> device = MobileDetect.new({}, 'Gécko'.force_encoding('ASCII-8BIT'))
=> #<MobileDetect:0x007f943b679130 @http_headers={}, @user_agent="G\xC3\xA9cko">
irb(main):003:0> device.mobile?
Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
    from /Users/nealjmd/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/mobile-detect-0.2.0/lib/mobile_detect/core.rb:153:in `match'
    from /Users/nealjmd/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/mobile-detect-0.2.0/lib/mobile_detect/core.rb:162:in `block in match_detection_rules_against_UA'
    from /Users/nealjmd/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/mobile-detect-0.2.0/lib/mobile_detect/core.rb:161:in `each'
    from /Users/nealjmd/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/mobile-detect-0.2.0/lib/mobile_detect/core.rb:161:in `match_detection_rules_against_UA'
    from /Users/nealjmd/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/mobile-detect-0.2.0/lib/mobile_detect/core.rb:33:in `mobile?'
    from (irb):3
    from /Users/nealjmd/.rbenv/versions/2.3.0/bin/irb:11:in `<main>'

This is obviously a toy example, but we're getting the same error from real headers, mostly from people in Spain. For the time being, we're forcing all the header encodings to UTF-8, but it would be great if this was handled internally to mobile_detect.

Thanks!

ktaragorn commented 8 years ago

Hi @NealJMD I am honored that you are using this gem in production.

This error likely came from a recent fix where we updated the regex comparison to utf-8. But I am having trouble understanding your issue. As you mentioned this is a toy example. Particularly, force_encoding doesnt actually reencode the string but just makes it pretend to be that encoding. The string you provided cannot be valid ascii-8bit string as seen by

2.3.0 :018 > "Gécko".encode("ASCII-8BIT")
Encoding::UndefinedConversionError: U+00E9 from UTF-8 to ASCII-8BIT
        from (irb):18:in `encode'
        from (irb):18
        from ./bin/mobile_detect:14:in `<main>'

Is it possible for you to provide a legit example? I wonder if the headers at your end are forced into ascii-8bit somewhere in your stack..

The reason I ask is, the way to fix this might be to encode the string into utf-8 before comparing, but I cant do that with the example you provided (after force_encoding)

ktaragorn commented 8 years ago

Alternatively you could try .encode('utf-8') on your breaking examples, if that works for you I can add it to the project. Until then you can use the previous version of the gem.