mudge / re2

Ruby bindings to RE2, a "fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python".
http://mudge.name/re2/
BSD 3-Clause "New" or "Revised" License
129 stars 13 forks source link

ignores :utf8 => true argument #18

Closed mattes closed 10 years ago

mattes commented 11 years ago
m = RE2::Regexp.new(regex, :utf8 => true).match(content)
puts m[1].encoding
 => ASCII-8BIT
m = RE2::Regexp.new(regex).match(content)
puts m[1].encoding
 => ASCII-8BIT
mattes commented 11 years ago

treat results with encode("utf-8", "iso-8859-1") should help for now.

mudge commented 11 years ago

The issue here is that the :utf8 argument is passed through to the underlying re2 library but the gem itself doesn't use any of Ruby 1.9's string encoding. We'll have to make sure that any encoding support is conditional so that the gem remains compatible with Ruby 1.8.

mudge commented 11 years ago

@tenderlove's "String Encoding in Ruby 1.9 Extensions" should help here.

mudge commented 10 years ago

Fixed as of v0.6.0.