Open Thom1729 opened 6 years ago
Is it certain that Oniguruma didn't mean x(?i)y|z
to become x[Yy]|[Zz]
? The wording really isn't clear on that.
By observation, it's grouped like x(?i)(?:y|z)
= x(?i:y|z)
. I've tested this in Sublime (using (?<!0)
to force Oniguruma) and in the highlighter I'm working on.
I rather meant it in the way whether we know it's not a bug. Because it really does seem weird to parse it like that.
I've opened an issue to verify.
It would be better for Sublime to replicate the bug than to differ from Oniguruma. However, if it is a bug, and it is fixed in Oniguruma, than that might be a good reason for Sublime to update its Oniguruma version.
I always felt like (?i)
to express some kind of globally applied flag to everything following it. This is actually what https://stackoverflow.com/questions/15145659/what-do-i-and-i-in-regex-mean#15145701 says, too.
So it is not a bug of Oniguruma.
Since I just went through the referenced issue, the intended solution for Oniguruma is to interpret x(?i)y|z
as x(?i)(?:y|z)
.
See also this test case: https://github.com/kkos/oniguruma/commit/0b7a1b9d894473b396c42c6afc99c85e280f83c9#diff-f1faa5ae6ee6c139773f8424cadf6112R398
Expected behavior
When Sublime's custom regexp engine handles a regexp, it should behave identically to Oniguruma.
Actual behavior
Oniguruma has a quirk when parsing isolated options (e.g.
(?i)
) that Sublime does not replicate. When Oniguruma encounters isolated options, the remainder of the enclosing group (or of the expression, if there is no enclosing group) is implicitly grouped. For instance, the following expressions are equivalent:The documentation is less than clear, and this behavior is unintuitive, but it is consistent. I suppose that option groups are parsed with the same precedence as the
|
operator.Sublime's custom regexp engine, however, will interpret that expression differently, so that the following are equivalent:
As a result, the same construct may be interpreted differently depending on whether the expression triggers the Oniguruma engine or uses the native Sublime engine. This is confusing. In addition, this is an obstacle to third-party implementations and other tools.
Sample syntax
Sample input
Notes
The core HTML syntax inadvertently relies upon this bug. I will submit a PR to correct that.
A suggested best practice to avoid this issue is to avoid isolated options, except at the very beginning of an expression (and never in
variables
). Instead, use noncapturing groups with flags. For example, instead ofa(?i)b
, usea(?i:b)
.