refi64 / rejit

A work-in-progress JIT-powered regex engine
Mozilla Public License 2.0
110 stars 4 forks source link

Segfault with repeat operator nested within another repeat #10

Closed refi64 closed 8 years ago

refi64 commented 8 years ago

(?:a*)* segfaults. Seems like such an innocent regex...

Worth noting that all of the following work:

But the following also do not work:

Seems to be a threading issue. Before the introduction of threads, they looped indefinitely.

refi64 commented 8 years ago

I think this is a stack overflow. I'm guessing that, since * is greedy, it keeps spawning threads? Or something?

Also, this fails with an empty input.

refi64 commented 8 years ago

The problem:

Take (?:a?)*. The a? spawns a thread that will cut to the *. When it fails to match, it then executes that thread, which goes to the *. Because the match was technically successful, the * will jump back to the a?...and the process loops forever. But the initial thread that * pushes at the beginning stays on the stack, eventually causing an overflow.

In Python and Perl, the group seems to never, ever have any contents. I find that slightly odd, since I would think that, since ? is greedy, it would instantly eat up the first character...?

refi64 commented 8 years ago

In Russ Cox's VM, he doesn't fork again at the end of *, instead just doing a jump back to the top. That may fix the issue.