Open Aloso opened 2 years ago
@wy16W2pIilK1xgqN could you explain? What devices are they?
A lot , routers and firewalls, For example, all devices of MikroTik
The problem is that ERE doesn't support non-capturing groups, like
("hello"? | "world"+) "!!"
which compiles to
(?:(?:hello)?|(?:world)+)!!
For ERE, this would have to compile to
((hello)?|(world)+)!!
But this is not equivalent, because it changes the capturing group indexes. So we either need an option to never emit non-capturing groups when compiling to ERE, or we need to make the above code illegal, requiring capturing groups like this:
:(:("hello")? | :("world")+) "!!"
Although the outer capturing group could be avoided by "inlining" the exclamation mark:
(:("hello")? | :("world")+) "!!"
(hello)?!!|(world)+!!
But that could lead to exponential size increase of the generated expression, so probably not a good idea.
The other problem is that ERE does not allow escaping characters within a character class, so characters need to be rearranged:
['^' 'a'-'z' '\' '-' ']']
will have to be compiled to
[]^a-z\-]
Rules:
^
can't appear at the start]
can only appear at the start-
can only appear at the start or endAnother problem: Codepoint
/C
doesn't work (it compiles to [\s\S]
, which is not supported in ERE), so what are the alternatives?
C
to .
, but that would change the behavior of the pomsky expression depending on the flavor; not goodC
to (.|\s)
, but that can lead to catastrophic backtracking; also, \s
is supported by GNU ERE but not POSIX ERE; not goodThe dot is now supported as of Pomsky 0.8. Rewriting the code for compiling character classes is in progress, with the goal of eventually supporting ERE. The only open question right now is how to handle non-capturing groups. Any input for this would be appreciated!
Possibilities are:
disallow non-capturing groups when targeting ERE, requiring users to write :()
instead
add an option to silently convert non-capturing groups to capturing groups when targeting ERE; this could be made configurable, e.g. with -Xcapture=always
Both have disadvantages (1. makes pomsky expressions less portable, but 2. makes behavior of pomsky expressions less predictable).
Proposed solution:
A captures
mode is added, which is enabled by default. To use non-capturing groups when targeting ERE, this mode must be disabled:
disable captures;
("hello"? | "world"+) "!!"
With this mode disabled, capturing groups (:()
and :name()
) are not allowed, but the compiler is allowed to produce capturing regex groups (assuming that they won't be used, since their indices do not correspond to anything).
Alternatively, capturing groups can be used in ERE, but compilation will fail if this results in a non-capturing group:
:(:("hello")? | :("world")+) "!!"
A possibility to make this more ergonomic is to allow numbering them explicitly, if you want to match a particular group:
:(:2("hello")? | :3("world")+) "!!"
Here, only the 2nd and 3rd capturing groups are numbered explicitly.
There are many devices that only support ERE, We need it.