mattn / mruby-onig-regexp

mrbgem of 鬼雲's Regular Expression
31 stars 36 forks source link

test fails when MRB_UTF8_STRING is defined #106

Closed masahino closed 2 years ago

masahino commented 2 years ago

Segmentation fault occurs in the following test when MRB_UTF8_STRING is defined.

assert_raise(ArgumentError) { "\xf0".gsub(/[^a]/,"X") }

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x107ffffe0)
  * frame #0: 0x000000019a034328 libsystem_platform.dylib`_platform_memmove + 536
    frame #1: 0x0000000100044ac8 mrbtest`str_init_embed(s=0x00000001200080f0, p="", len=-3) at string.c:53:10
    frame #2: 0x0000000100044d68 mrbtest`str_new(mrb=0x0000000102008200, p="", len=-3) at string.c:121:12
    frame #3: 0x0000000100044d0c mrbtest`mrb_str_new(mrb=0x0000000102008200, p="", len=-3) at string.c:160:24
    frame #4: 0x00000001000938c8 mrbtest`str_substr(mrb=0x0000000102008200, str=(w = 4831871456), beg=4, len=-3) at mruby_onig_regexp.c:105:10
    frame #5: 0x000000010008ff7c mrbtest`match_data_post_match(mrb=0x0000000102008200, self=(w = 4831871936)) at mruby_onig_regexp.c:646:10
    frame #6: 0x000000010005510c mrbtest`mrb_funcall_with_block(mrb=0x0000000102008200, self=(w = 4831871936), mid=1118, argc=0, argv=0x000000016fdfd958, blk=(w = 0)) at vm.c:561:13
    frame #7: 0x00000001000549c4 mrbtest`mrb_funcall_argv(mrb=0x0000000102008200, self=(w = 4831871936), mid=1118, argc=0, argv=0x000000016fdfd958) at vm.c:577:10
    frame #8: 0x0000000100054924 mrbtest`mrb_funcall(mrb=0x0000000102008200, self=(w = 4831871936), name="post_match", argc=0) at vm.c:374:10
    frame #9: 0x0000000100092f38 mrbtest`onig_match_common(mrb=0x0000000102008200, reg=0x000000010180a7e0, match_value=(w = 4831871936), str=(w = 4831873760), pos=0) at mruby_onig_regexp.c:199:16
    frame #10: 0x000000010009056c mrbtest`string_gsub(mrb=0x0000000102008200, self=(w = 4831873760)) at mruby_onig_regexp.c:794:8
    frame #11: 0x00000001000594f8 mrbtest`mrb_vm_exec(mrb=0x0000000102008200, proc=0x0000000120008b40, pc="8\U00000002") at vm.c:1636:18
    frame #12: 0x0000000100056cf8 mrbtest`mrb_vm_run(mrb=0x0000000102008200, proc=0x000000012000fa10, self=(w = 4831919648), stack_keep=0) at vm.c:1131:12
    frame #13: 0x0000000100055d30 mrbtest`mrb_top_run(mrb=0x0000000102008200, proc=0x000000012000fa10, self=(w = 4831919648), stack_keep=0) at vm.c:3040:12
    frame #14: 0x0000000100037310 mrbtest`load_irep(mrb=0x0000000102008200, proc=0x000000012000fa10, c=0x0000000000000000) at load.c:681:10
    frame #15: 0x0000000100037224 mrbtest`mrb_load_irep_cxt(mrb=0x0000000102008200, bin="RITE0300", c=0x0000000000000000) at load.c:689:10
    frame #16: 0x00000001000373a0 mrbtest`mrb_load_irep(mrb=0x0000000102008200, bin="RITE0300") at load.c:701:10
    frame #17: 0x00000001000058a0 mrbtest`GENERATED_TMP_mrb_mruby_enum_ext_gem_test(mrb=0x0000000100809800) at gem_test.c:588:3
    frame #18: 0x0000000100004ef8 mrbtest`mrbgemtest_init(mrb=0x0000000100809800) at mrbtest.c:54:5
    frame #19: 0x0000000100003d40 mrbtest`main(argc=1, argv=0x000000016fdff718) at driver.c:304:3
    frame #20: 0x00000001002b10f4 dyld`start + 520
masahino commented 2 years ago

I use latest version of mruby(6b2f08d).

mattn commented 2 years ago

Hmm, can't repro

image

masahino commented 2 years ago

On my environment(tried on Mac and Ubuntu), mrb_str_new is called with len = -3 at str_substr. https://github.com/mattn/mruby-onig-regexp/blob/76087d150d12f167e95ae10d326099b352cf3d18/src/mruby_onig_regexp.c#L105

This is because reg->end[0] = 4 in match_data_post_match, but I am not sure if this is the correct behavior. https://github.com/mattn/mruby-onig-regexp/blob/76087d150d12f167e95ae10d326099b352cf3d18/src/mruby_onig_regexp.c#L646

How about checking len as follows?

https://github.com/masahino/mruby-onig-regexp/commit/25b152264cd4dfac84aa39e5fbb428ec84b95e6a

mattn commented 2 years ago

Looks good. Could you please send me PR?