w3c / webdriver-bidi

Bidirectional WebDriver protocol for browser automation
https://w3c.github.io/webdriver-bidi/
351 stars 38 forks source link

Fix URLPattern matching #507

Closed jgraham closed 11 months ago

jgraham commented 1 year ago

@juliandescottes left a bunch of comments on https://github.com/w3c/webdriver-bidi/pull/429 which weren't resolved before the PR landed, in the interests of making some progress.

The general theory here is that we want the behaviour to be a subset of https://wicg.github.io/urlpattern/ with only support for literals (no wildcards, regexps, etc.)

juliandescottes commented 1 year ago

Some of the follow up items I mentioned:

juliandescottes commented 1 year ago

While the second bullet point is rather straightforward, for the first one it's a bit more complicated.

Our BiDi spec uses the URL parser / basic URL parser to process string URLs. Based on that spec, when there is no ? in the URL, the Basic URL Parser should return null for the URL's query. And then, if we process this according to the URL Pattern spec, this means we should consider this as a wildcard and match everything. See https://wicg.github.io/urlpattern/#compile-a-component.

However, the URLPattern spec is not using the URL Parser to process string input constructor. It's following the steps at https://wicg.github.io/urlpattern/#constructor-string-parsing, which if I'm reading them correctly, is setting an empty string by default for the search (step 2.2.1.4). And that would match what Chrome's behavior I believe.

We should either stop using the URL Parser for processing URLPatternString patterns, or at least modify the defaults enough in order to match the logic of the URLPattern parser.

juliandescottes commented 1 year ago

It's also worth noting that the match algorithm from URLPattern will force the empty string as the default value for several components, including search, see https://wicg.github.io/urlpattern/#match

Set hostname to url’s host or the empty string if the value is null. Set port to url’s port or the empty string if the value is null. Set search to url’s query or the empty string if the value is null.

Which means that if we try to match this spec, we actually won't differentiate between empty string search and null search.

juliandescottes commented 1 year ago

Another scenario which is not correctly handled by the current spec is with URLPatternPattern which only specify a port which happens to be a default port for another scheme than http.

Eg {port: "443"}. With the current algorithm, we will internally build the following URL for this pattern: http://placeholder:443/ (http and placeholder are the fallback values used by the URLPatternPattern parser). Since 443 is not the default port for http, it won't get replaced by the empty string, which means we will expect explicitly 443.

Meaning this pattern will match "http://example.com:443", but not "https://example.com", and not even "https://example.com:443".