Closed wanderview closed 2 years ago
Another way this can happen is if an intermediate empty group is removed. For example:
(foo){}*
Currently becomes:
(foo)*
But should be:
(foo)(.*)
This can also produce pattern strings that don't even parse. Consider:
*{}**?
Becomes:
**?
Another way generating strings can change the meaning... This:
{:k\\A}
Becomes:
{:kA}
Which changes from a named :k
group with a suffix A
to a single named :kA
group.
Another type of problem here with:
{:foo}(.*)
Assuming we fix this not to convert (.*)
to *
, we also need to avoid accidentally changing it to:
:foo(.*)
That changes the pattern from two groups to a single group with a custom name and wildcard regexp.
Another slight variation on the problem:
{:foo}bar
Becomes:
:foobar
@wanderview The WPTs seem to have landed for this, and the Chromium change also seems to have landed, but there is no spec change yet. Is this in the works?
Ah no, it is #152. Sorry.
Sorry, I went on holiday prior to finishing the spec changes. It will be the first thing I do now that I'm back.
I believe this is fixed in the spec now. I will work on updating the polyfill.
We found another issue along these lines:
/foo\\/:bar?
Should require the second slash because the escape prevents it from becoming a prefix to the :bar?
group. Pattern string generation, however, produces this:
/foo/:bar?
This was found by our fuzzer here:
https://bugs.chromium.org/p/chromium/issues/detail?id=1285620
Another one of these was found by the fuzzer:
https://bugs.chromium.org/p/chromium/issues/detail?id=1289115
In this case if an implicit prefix is before :foo
group it can incorrectly prevent us from protecting from trailing fixed characters.
Currently, when generating canonical pattern strings the algorithm always translates
.*
regex values into wildcards like*
. Unfortunately, this can result in a change of meaning if it does not consider the values before the regexp group.Consider, this:
(foo)(.*)
Gets "canonicalized" to:
(foo)*
If this is parsed again the
*
is treated as a modifier of(foo)
instead of as a wildcard.The algorithm should be fixed to look at the preceding part to determine if its safe to convert a
.*
regexp into a wildcard character.