mruby / mruby

Lightweight Ruby
MIT License
5.3k stars 787 forks source link

unicode escape characters in regular expression literal are not parsed by mruby parser #2007

Closed tsahara closed 10 years ago

tsahara commented 10 years ago

Before https://github.com/mruby/mruby/commit/5f2817b36c32ff71031c514b2fdf51ba6b74d83c , unicode escape characters in regular expression literal are parsed by mruby parser and converted to utf-8 byte sequences. Regexp.compile expects that:

class Regexp
  def self.compile(re)
    p re
  end
end
/\u0300/
% bin/mruby a.rb
"\314\200"

But after the commit, they are not converted:

% bin/mruby a.rb
"\\u0300"

Is this an intentional change?

matz commented 10 years ago

Regexp may or may not support \u according to engines. In fact, escape sequence varies from engine to engine. So I thought all escaping should be done better by regexp parser provided by engines. For example, as far as I checked Oniguruma, it supports to compile \u sequences.

But this issue report may indicate problems. How much do you think mruby parser should convert backslash escape sequences independently from regexp engine?

matz commented 10 years ago

For example, you don't want /foo\u42/ to be /foo*/ and match with "foooo", do you?

tsahara commented 10 years ago

No, I don't :) (though it can be safely converted to \x2a for some regular expression library... hmm, you mean \u002a?). While mruby does not define syntax of regular expression literal except it is enclosed by /, it sounds reasonable to me that mruby parser does not convert any backslash escape sequences.

matz commented 10 years ago

In that case, I'd make it a spec of mruby regexp and close this issue.

tsahara commented 10 years ago

Thank you for clarification.

texrg commented 10 years ago

ok. I found solve:

"Żółta żaba żarła żur".split("")


Why my program dosnt work in mruby? "Żółta żaba żarła żur".scan(/./m).each {|a| print a}

matz commented 10 years ago

"doesn't work" doesn't help to solve your problem. You have to tell us what you expected, what you got and info about your platform. What regexp gem did you use?