whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
533 stars 139 forks source link

Allow specifying additional "special" schemes. #749

Open tmccombs opened 1 year ago

tmccombs commented 1 year ago

The parsing algorithm behaves differently for certain domains that are considered "special". In addition the scheme of a non-special URL cannot be changed to a special scheme. In some applications, especially non-web-browser applications, it is desirable for additional schemes to be treated the same way as the listed special schemes, and be able change the protocol/scheme to and from other special schemes.

I think there are a few ways this could be addressed:

  1. Change the API to allow passing a list of additional special schemes into the constructor for URL
  2. Change the API to allow specifying that a URL should be treated as a special url during construction
  3. Add a new URLFactory (or URLBuilder) class that allows configuring the set of special schemes for any URLs created with it.
  4. Do not specify any additional required API, but say that an implementation is allowed to treat additional schemes as special, and potentially include an API for registering additional special schemes.

Some examples of schemes that applications may wish to treat as special:

I follow the rust-url repository, which aims at implementing this specification, and issues related to this come up pretty frequently. For example:

Related issues for this repository:

annevk commented 1 year ago

I think the answer here is 5. It's worth clarifying in the standard that this is a non-goal, as it indeed occasionally comes up.

Instead what you'd do is define a processor that takes a URL and turns it into a data structure suitable for further usage. E.g., what we do in https://fetch.spec.whatwg.org/#data-urls for data: URLs. Such a scheme-specific processor can take care of adding a path, further processing an opaque host, etc.

The reason for that is that URL parsing ought to be stable over time and across implementations. Implementations should not have differing views as to what a URL string represents, how it serializes once parsed, etc. And if URLs are further processed ideally that aligns across implementations as well, but that will only happen in implementations purporting to support the scheme, which will be a subset.

tmccombs commented 1 year ago

My point is that a subset of custom schemes are basically identical to http/https, but use a different scheme to convey some additional information. Such a separate processor would have to duplicate a lot of what the Url parser already implements.

annevk commented 1 year ago

Yeah, understood.