hoaproject / Regex

The Hoa\Regex library.
https://hoa-project.net/
310 stars 17 forks source link

Add support for (*VERB) #37

Closed unkind closed 4 years ago

unkind commented 5 years ago

http://pcre.org/current/doc/html/pcre2pattern.html#SEC27

It seems like there is no support yet for (*MARK:foo), (*FAIL), etc. They are rarely needed, but sometimes they are required.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/66981624-add-support-for-verb?utm_campaign=plugin&utm_content=tracker%2F6167031&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F6167031&utm_medium=issues&utm_source=github).
Hywan commented 5 years ago

Thanks, we will work on it as soon as we have time! Do you need it in a project soon (just to help us prioritize)?

unkind commented 5 years ago

Luckily, I don't need to parse exactly this part. I just made some experiments with lexer based on (?|...) ("branch reset group"):

~
\G
(?|
    (?:[^",\r\n]+)(*MARK:token0)
    |
    (?:"[^"\\]*(?:\\.[^"\\]*)*")(*MARK:token1)
    |
    (?:,)(*MARK:token2)
    |
    (?:\r?\n)(*MARK:token3)
    |
    [\S\s](*MARK:error)
)
~Jx

I parsed only tokens' sub-patterns: (?:[^",\r\n]+), (?:"[^"\\]*(?:\\.[^"\\]*)*"), etc. It required some time to reassemble regex string from modified AST, though.