Open lucacasonato opened 1 year ago
Actually I think we need to fix this for sure. Consider these two patterns:
// valid regexp, but pattern tokenizer thinks there is a nested group that isn't closed
const pattern = new URLPattern({ pathname: '/([(?])' });
// valid regexp group, valid urlpattern, except that this throws because "Invalid regular expression: /^(?:/([))\]\)$/u: Unterminated character class"
const pattern = new URLPattern({ pathname: '/([)])' });
Yea, seems like a real problem to me. I'm not sure when I will have the bandwidth to look at this, though. Do you have a proposed fix?
I think we have to keep track of [
and ]
in the tokenizer while parsing regexp tokens, and ignore all (
and )
while between a [
and ]
in that regexp. This can be a simple boolean as character classes can't be nested (no need to keep track of depth).
character classes can't be nested (no need to keep track of depth).
Is this true? Given #178 we'll support syntax like [\d--[07]]
(every digit except 0 and 7) as part of the UnicodeSets regexp mode.
Fails, even though it's valid:
OK:
This is because while tokenizing the pattern, we think the second
(
is a nested group rather than just a char in a regexp character class.Fixing this will make the tokenizer more complicated. Is it worth it?