oniguruma allows non-word char (like '-') in group name.
onigmo does not allows non-word char in group name.
CRuby change regexp library from oniguruma to onigmo. However, It kept backward compatibility by patching onigmo.
Discussion point
I think, we have some solutions.
No change. We allow incompatibility by linked library.
Add compile option to onigmo like ONIG_OPTION_ALLOW_NON_WORD_CHAR_IN_CAPTURE_GROUP.
Drop supporting linking with shared library and use bundled onigmo only.
Fork mruby-onig-regexp and it support only bundled onigmo only.
Add patch to bundled onigmo in mruby-onig-regexp like this. But this solution can't resolve when linking with existing onigmo.so
diff --git a/mrbgem.rake b/mrbgem.rake
index 0d37b12..1404353 100644
--- a/mrbgem.rake
+++ b/mrbgem.rake
@@ -66,6 +66,7 @@ MRuby::Gem::Specification.new('mruby-onig-regexp') do |spec|
_pp 'autotools', oniguruma_dir
run_command e, './autogen.sh' if File.exists? 'autogen.sh'
run_command e, "./configure --disable-shared --enable-static #{host}"
+ run_command e, "patch -p1 < #{dir}/onigmo-#{version}.patch"
run_command e, "make -j#{$rake_jobs || 1}"
else
run_command e, 'cmd /c "copy /Y win32 > NUL"'
diff --git a/onigmo-6.1.3.patch b/onigmo-6.1.3.patch
new file mode 100644
index 0000000..4163913
--- /dev/null
+++ b/onigmo-6.1.3.patch
@@ -0,0 +1,13 @@
+diff --git a/regparse.c b/regparse.c
+index 431aad9..54563ac 100644
+--- a/regparse.c
++++ b/regparse.c
+@@ -2509,7 +2509,7 @@ get_name_end_code_point(OnigCodePoint start)
+ # ifdef RUBY
+ # define ONIGENC_IS_CODE_NAME(enc, c) TRUE
+ # else
+-# define ONIGENC_IS_CODE_NAME(enc, c) ONIGENC_IS_CODE_WORD(enc, c)
++# define ONIGENC_IS_CODE_NAME(enc, c) TRUE
+ # endif
+
+ # ifdef USE_BACKREF_WITH_LEVEL
I think mruby's regexp should support non-word char in group name.
Because I think it is better that the specification of mruby as close as possible to CRuby.
But I have to fork mruby-onig-regexp to resolve the problem perfectly.
ref https://github.com/mattn/mruby-onig-regexp/pull/90
I want to discuss about group name.
Current
mruby-onig-regexp can use non-word char in group name when linked by oniguruma. But it can't use when linked by onigmo.
Problem
So, mgem can't use non-word char in group name because it depends on mruby-onig-regexp. (e.g. mruby-uri)
Background
CRuby can use non-word char in group name.
I investigated the background.
'-'
) in group name.Discussion point
I think, we have some solutions.
ONIG_OPTION_ALLOW_NON_WORD_CHAR_IN_CAPTURE_GROUP
.I think mruby's regexp should support non-word char in group name. Because I think it is better that the specification of mruby as close as possible to CRuby. But I have to fork mruby-onig-regexp to resolve the problem perfectly.
How do you think?