whatwg / urlpattern

URL Pattern Standard
https://urlpattern.spec.whatwg.org/
Other
160 stars 22 forks source link

pattern string canonicalization with wildcards can result in different meanings #145

Closed wanderview closed 2 years ago

wanderview commented 3 years ago

Currently, when generating canonical pattern strings the algorithm always translates .* regex values into wildcards like *. Unfortunately, this can result in a change of meaning if it does not consider the values before the regexp group.

Consider, this:

(foo)(.*)

Gets "canonicalized" to:

(foo)*

If this is parsed again the * is treated as a modifier of (foo) instead of as a wildcard.

The algorithm should be fixed to look at the preceding part to determine if its safe to convert a .* regexp into a wildcard character.

wanderview commented 3 years ago

Another way this can happen is if an intermediate empty group is removed. For example:

(foo){}*

Currently becomes:

(foo)*

But should be:

(foo)(.*)

wanderview commented 3 years ago

This can also produce pattern strings that don't even parse. Consider:

*{}**?

Becomes:

**?

wanderview commented 3 years ago

Another way generating strings can change the meaning... This:

{:k\\A}

Becomes:

{:kA}

Which changes from a named :k group with a suffix A to a single named :kA group.

wanderview commented 2 years ago

Another type of problem here with:

{:foo}(.*)

Assuming we fix this not to convert (.*) to *, we also need to avoid accidentally changing it to:

:foo(.*)

That changes the pattern from two groups to a single group with a custom name and wildcard regexp.

wanderview commented 2 years ago

Another slight variation on the problem:

{:foo}bar

Becomes:

:foobar

lucacasonato commented 2 years ago

@wanderview The WPTs seem to have landed for this, and the Chromium change also seems to have landed, but there is no spec change yet. Is this in the works?

lucacasonato commented 2 years ago

Ah no, it is #152. Sorry.

wanderview commented 2 years ago

Sorry, I went on holiday prior to finishing the spec changes. It will be the first thing I do now that I'm back.

wanderview commented 2 years ago

I believe this is fixed in the spec now. I will work on updating the polyfill.

wanderview commented 2 years ago

We found another issue along these lines:

/foo\\/:bar?

Should require the second slash because the escape prevents it from becoming a prefix to the :bar? group. Pattern string generation, however, produces this:

/foo/:bar?

This was found by our fuzzer here:

https://bugs.chromium.org/p/chromium/issues/detail?id=1285620

wanderview commented 2 years ago

Another one of these was found by the fuzzer:

https://bugs.chromium.org/p/chromium/issues/detail?id=1289115

In this case if an implicit prefix is before :foo group it can incorrectly prevent us from protecting from trailing fixed characters.