mattn / mruby-onig-regexp

mrbgem of 鬼雲's Regular Expression
30 stars 34 forks source link

Supporting non-word char in group name #91

Open ksss opened 5 years ago

ksss commented 5 years ago

ref https://github.com/mattn/mruby-onig-regexp/pull/90

I want to discuss about group name.

Current

mruby-onig-regexp can use non-word char in group name when linked by oniguruma. But it can't use when linked by onigmo.

link with oniguruma.so onigmo.so bundled onigmo
non-word OK NG NG

Problem

So, mgem can't use non-word char in group name because it depends on mruby-onig-regexp. (e.g. mruby-uri)

Background

CRuby can use non-word char in group name.

$ ruby -e 'p Regexp.new("bad(?<aa-bb>.*)").match("badboy")["aa-bb"]'
"boy"

I investigated the background.

Discussion point

I think, we have some solutions.

diff --git a/mrbgem.rake b/mrbgem.rake
index 0d37b12..1404353 100644
--- a/mrbgem.rake
+++ b/mrbgem.rake
@@ -66,6 +66,7 @@ MRuby::Gem::Specification.new('mruby-onig-regexp') do |spec|
           _pp 'autotools', oniguruma_dir
           run_command e, './autogen.sh' if File.exists? 'autogen.sh'
           run_command e, "./configure --disable-shared --enable-static #{host}"
+          run_command e, "patch -p1 < #{dir}/onigmo-#{version}.patch"
           run_command e, "make -j#{$rake_jobs || 1}"
         else
           run_command e, 'cmd /c "copy /Y win32 > NUL"'
diff --git a/onigmo-6.1.3.patch b/onigmo-6.1.3.patch
new file mode 100644
index 0000000..4163913
--- /dev/null
+++ b/onigmo-6.1.3.patch
@@ -0,0 +1,13 @@
+diff --git a/regparse.c b/regparse.c
+index 431aad9..54563ac 100644
+--- a/regparse.c
++++ b/regparse.c
+@@ -2509,7 +2509,7 @@ get_name_end_code_point(OnigCodePoint start)
+ # ifdef RUBY
+ #  define ONIGENC_IS_CODE_NAME(enc, c)  TRUE
+ # else
+-#  define ONIGENC_IS_CODE_NAME(enc, c)  ONIGENC_IS_CODE_WORD(enc, c)
++#  define ONIGENC_IS_CODE_NAME(enc, c)  TRUE
+ # endif
+
+ # ifdef USE_BACKREF_WITH_LEVEL

I think mruby's regexp should support non-word char in group name. Because I think it is better that the specification of mruby as close as possible to CRuby. But I have to fork mruby-onig-regexp to resolve the problem perfectly.

How do you think?

take-cheeze commented 5 years ago

Another option is to switch back to oniguruma since recently there is more releases than onigmo: https://github.com/kkos/oniguruma/releases