whatwg / urlpattern

URL Pattern Standard
https://urlpattern.spec.whatwg.org/
Other
154 stars 20 forks source link

Add a "using URLPattern in other APIs" section #182

Open domenic opened 12 months ago

domenic commented 12 months ago

We're seeing URLPattern being used in several proposals now:

I think we need clearer guidance on how to design APIs that accept URL patterns, as we've started to see some slight divergence.

Scenarios to consider:

Other considerations: is there any place for pathname-only patterns? IMO probably not because anytime you match on pathname you should also allow matching on search, i.e. discriminating between URL patterns like /products/* and /product?id=* is not good. But it's come up a few times.

My proposal is to use the above rules for such situations, and enforce "pathname + search only" by checking that the resulting URL pattern's protocol/username/password/hostname/port are set to match the base URL's origin. Providing a spec helper for such a check would be a good idea, I think.

/cc @sisidovski @yoshisatoyanagisawa @pmeenan @horo-t.

jeremyroman commented 12 months ago

We're seeing URLPattern being used in several proposals now:

I think we need clearer guidance on how to design APIs that accept URL patterns, as we've started to see some slight divergence.

Scenarios to consider:

  • JS APIs:

    • Should accept URLPattern objects

    • What "shorthands" / "raw inputs" can they accept? Keep in mind that URLPattern has multiple constructor forms: new URLPattern({ ...componentsDictionary, baseURL }), new URLPattern(patternString, baseURL), and then variants of each of those with another options argument.

    • My suggestion:

    • Such APIs should have a default base URL, and then accept either:

    • { ...componentsDictionary } (which automatically gets the base URL),

I think supporting this might give you URLPattern objects "for free" because the components are exposed as properties with the same names, much like how many Foo interfaces satisfy the corresponding FooInit dictionary.

* `{ ...componentsDictionary, baseURL }` (overriding the base URL), or
* `patternString` (which automatically gets the base URL).
  • Downsides of my suggestion:

    • The shorthand form can't set the options (i.e. ignoreCase)
    • The shorthand form can't combine a string with an overridden base URL (i.e., there is no counterpart to the new URLPattern(patternString, baseURL) constructor).
    • The shorthand form depends on fixing Base URL inheritance gives unintuitive results #179 to give intuitive results for this kind of implicit base URL inheritance

FWIW, shorthands not having all capabilities seems fine to me.

The natural base URL is almost always the response URL (or request URL, for request headers), right?

  • JSON APIs:

    • My suggestion (and what we implemented for speculation rules): the same as the "shorthands" sub-question for JS APIs.

    • Some of the downsides above become more serious here, because there's no new URLPattern() escape hatch; the only possible form is the "shorthand" form.

Other considerations: is there any place for pathname-only patterns? IMO probably not because anytime you match on pathname you should also allow matching on search, i.e. discriminating between URL patterns like /products/* and /product?id=* is not good. But it's come up a few times.

My proposal is to use the above rules for such situations, and enforce "pathname + search only" by checking that the resulting URL pattern's protocol/username/password/hostname/port are set to match the base URL's origin. Providing a spec helper for such a check would be a good idea, I think.

+1, though I'm not even sure how often we'll need spec text, as I'd expect in practice the pattern never matching will cause no behavior. So it might just amount to non-normative text advising a warning if the pattern cannot possibly match (because it doesn't match the origin's scheme/host/port, because it doesn't see fragments but the pattern requires a non-empty fragment, etc).

sisidovski commented 12 months ago

Thank you for raising this.

The shorthand form can't set the options (i.e. ignoreCase)

In https://github.com/WICG/compression-dictionary-transport/pull/51#issuecomment-1714280658, the new option to URLPatternOptions is proposed. So this limitation may be more serious in the future, but some limitations are reasonable as far as the API accepts URLPattern itself.

My proposal is to use the above rules for such situations, and enforce "pathname + search only" by checking that the resulting URL pattern's protocol/username/password/hostname/port are set to match the base URL's origin.

I understand we'd like to have pathname+search match in the most case, but is this enforcement applied only when the base URL was added automatically? Service worker static routing may accept matching other URLPatternInit items such as hostname, since it handles cross origin resources etc.

pmeenan commented 12 months ago

One of the other things we'd need when integrating with other specs is an agreed-upon way to specify which tokens are supported. Specifically if regexp is supported or not since both compression dictionaries and the service worker routing specs have disabled regex support.

yoshisatoyanagisawa commented 11 months ago

Do you have a plan to make URLPattern gets an implicit baseURL? I expect the urlPattern field in the static routing API and new URLPattern behaves the same. I mean that if the short-hand form silently gets the baseURL, the original URLPattern should also silently get the baseURL. To prospect which URL can be matched by the URLPattern, it is natural for the web developers to create URLPattern by themselves, and execute its test method. It should be confusing if it behaves different from that is specified to the static routing API. At this point, baseURL should also be exposed in the URLPattern interface so that URLPattern object can be used as the static routing API's urlPattern's input.

Moreover, I expect URLPatternInit and the URLPattern interface has the IgnoreCase field. It is important information for the matching, and should be respected in the evaluator.

A flag to ask URLPattern to prevent regexp might need to be used at the compile time, and it might not need to be passed to the evaluator. I think it nice to make URLPattern API options to support the flag. It is also helpful for the people to understand the behavior with new URLPattern({...}, {prohibit_regexp: true}).

domenic commented 11 months ago

Do you have a plan to make URLPattern gets an implicit baseURL?

No. And I don't think this is the right question to be asking.

Consider how other web platform APIs behave. fetch("./foo.html") works, and has an implicit base URL. But new URL("./foo.html") does not work; there is no "URL with implicit base URL". You need the explicit new URL("./foo.html", baseURL).

This pattern is widespread. Almost every web API (maybe every, now?) comes with an implicit base URL. We don't build the concept of implicit base URL into the URL primitive itself; that still represents a parsed, absolute, URL. Instead we say that consumers of the URL concept have an implicit base URL, which they apply when you use the "shorthand" form fetch("./foo.html") instead of the explicit fetch(new URL("./foo.html", document.baseURI)).

When designing URL patterns and related APIs, how URLs and related APIs work is generally a good precedent.

I expect the urlPattern field in the static routing API and new URLPattern behaves the same.

I don't think this is the right expectation. new URLPattern is a constructor, which can take multiple arguments. The urlPattern field in the static routing API can only accept a single value. We need different APIs for these two places.

Additionally, if you want the full power of the new URLPattern constructor, you can use it directly: your single value can just be a URLPattern object, as in urlPattern: new URLPattern(..., ..., ...). If you are instead using a shorthand value, e.g. urlPattern: "a string", that can behave differently, because strings are a different type than URLPattern objects.

I don't think this will be confusing for web developers, in the same way that web developers don't find fetch("./foo.html") being shorthand for fetch(new URL("./foo.html", document.baseURI)) confusing.

Moreover, I expect URLPatternInit and the URLPattern interface has the IgnoreCase field. It is important information for the matching, and should be respected in the evaluator.

If you want ignoreCase functionality, then you can avoid using the shorthand, and use urlPattern: new URLPattern(..., { ignoreCase }).

A flag to ask URLPattern to prevent regexp might need to be used at the compile time, and it might not need to be passed to the evaluator.

I think adding this at the URLPattern API level, especially as a constructor option (URLPatternOptions), is not the right design. Certain APIs have restrictions on the URLPatterns they accept: some reject those with regexp parts; some reject (or always fail to match) those that are cross-origin; etc. This processing should be performed by each API.

We should make it easy to specify this prohibition. For example, there should be a specification hook like |urlPattern| [=contains regexp tokens=], so that other specifications can write If |urlPattern| [=contains regexp tokens=], then throw a {{TypeError}}.

If we want to expose this as a web developer-facing API, we could do so: then web developers could emulate the behavior of speculations using if (urlPattern.containsRegExpTokens) { throw new TypeError(); }. But I am unsure how valuable this is.

yoshisatoyanagisawa commented 11 months ago

I thought that the urlPattern condition in the static routing API would take two kinds of arguments and made it accept URLpatternInput, but it sounds three; USVString, a dictionary for short-hands, and URLPattern object. Upon the past discussion, I misunderstood that you meant passing a dictionary should behave equivalent to passing URLPattern object, but it sounds not. i.e. urlPattern: new URLPattern({...}) can be a different behavior from urlPattern: {...} even if {...} are the same.

By the way, I hope https://github.com/WICG/urlpattern/issues/179 is resolved soon. Until that is resolved, I believe it is bad idea to introduce implicit baseURL because it brings unexpected surprises like not matching URLs with query strings, which would commonly happens.

If you want ignoreCase functionality, then you can avoid using the shorthand, and use urlPattern: new URLPattern(..., { ignoreCase }).

How the WebIDL for the urlPattern condition's argument would be?

A flag to ask URLPattern to prevent regexp

My goal is not adding the flag but helping developers to recognize un-intended regular expression usage. Therefore, urlPattern.containsRegExpTokens still fulfills my goal.

domenic commented 11 months ago

By the way, I hope #179 is resolved soon.

Agreed. I think it is very reasonable to avoid introducing implicit base URL into service worker static routes until we resolve that. I think @jeremyroman will be working on it soon.

How the WebIDL for the urlPattern condition's argument would be?

I think it would be: (URLPattern or URLPatternInit or USVString)

My goal is not adding the flag but helping developers to recognize un-intended regular expression usage. Therefore, urlPattern.containsRegExpTokens still fulfills my goal.

This is reasonable. Perhaps we should open a new issue, which you or @sisidovski could send a pull request on, for such a proposal?

yoshisatoyanagisawa commented 11 months ago

How the WebIDL for the urlPattern condition's argument would be?

I think it would be: (URLPattern or URLPatternInit or USVString)

Thank you.

My goal is not adding the flag but helping developers to recognize un-intended regular expression usage. Therefore, urlPattern.containsRegExpTokens still fulfills my goal.

This is reasonable. Perhaps we should open a new issue, which you or @sisidovski could send a pull request on, for such a proposal?

I have filed https://github.com/WICG/urlpattern/issues/191

jeremyroman commented 9 months ago

Such a section has been added. Since the original issue comment mentioned HTTP headers which are still in discussion, leaving this issue thread open for now.

domenic commented 1 week ago

I'll note that https://datatracker.ietf.org/doc/draft-ietf-httpbis-compression-dictionary/17/ ends up explicitly instantiating URLPattern classes during HTTP header processing, which is somewhat confusing since, e.g., it doesn't talk about which JavaScript realm to use. It might be good to provide a better interface for HTTP header specs to use soon, if we can...

jeremyroman commented 1 week ago

The existing dfns should be enough to avoid using the WebIDL integration (as the JSON section demonstrates), but still maybe explicitly calling out HTTP headers makes it more straightforward.

pmeenan commented 1 week ago

Sorry about that. I just published draft-18 of the compression dictionaries ID which switches away from using the WebIDL and better mirrors what the JSON section demonstrates.

FWIW, it might be easier to reference if each of the High-level operations sections had a discrete sub-section. Right now the text version of the ID relies on "create", "match" and "has regexp groups" for each of the sections but it's not necessarily obvious when coming to the spec that that's the part of the document that is referred to (the HTML version has a deep link which makes it easier).

jeremyroman commented 1 week ago

@pmeenan

Let me know if you would like any IDs in the section I propose in #230 to be made permanent, too. At the moment it's basically advice and a single algorithm which doesn't do much over the direct create algorithm right now but might gain additional features if required in the future.

pmeenan commented 1 week ago

@pmeenan

Let me know if you would like any IDs in the section I propose in #230 to be made permanent, too. At the moment it's basically advice and a single algorithm which doesn't do much over the direct create algorithm right now but might gain additional features if required in the future.

Thanks, but I think the direct links to create work better for my use case since it already takes care of asserting the field as a structured field as part of the processing.