Open rotu opened 3 months ago
The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there, but this is valid. To canonicalize a protocol, it uses the result of running basic URL parser. And in scheme state
, a character should be an ASCII alphanumeric, U+002B (+), U+002D (-), or U+002E (.). So it should tolerate +
, like git+https
.
Actually chromium throws an error with the protocol string containing +
, it seems to be a bug to be fixed.
I haven't looked into this, but @sisidovski if you think this is just a bug in the Chromium implementation, could you file & link a Chromium bug?
for reference, this is also happening in Deno, so potentially more of a spec issue
@jeremyroman Filed https://crbug.com/357760925. I'm happy to work on it in spare time.
The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there
@sisidovski I'm not sure if you're saying there is no bug in the spec here.
I have a very hard time reading state-machine-oriented specs. What are the expected "token list" and "part list" and "protocol component" from Constructor string parsing given input "web+foo://example.com/baz"?
@rotu Thanks. I took a look again a bit more in detail, and probably I caught your point. The step 3.11 Run consume a required token given parser and "end" in parse-a-pattern-string will throw TypeError if web+foo
is passed, because this algorithm doesn't handle +
, which is treated as "other-modifier" token type in the token list, and obviously this is not the "end" token type.
@sisidovski I don't think I even understood my point when I wrote that.
As things stands, it's not even clear what the constructor string can and should look like! The pattern string section could definitely use some examples! The URL spec has many expository examples, which make it more approachable. (It's confusing to me that this spec supports two pattern matching languages, both path-to-regexp
-like patterns and regexp patterns. If I had my druthers, I'd probably ditch the new pattern syntax in favor of only regexp, but I doubt you share my appetite for that change!)
My naive expectation is that a URL string should also be a valid URL pattern. It a source of future confusion that this spec interprets legal URL characters non-literally in its pattern syntax. For instance, (
, {
, :
, \\
are valid in the query string but would need to be escaped in a URLPattern
constructor string. This deserves a prominent note explaining 1. what needs escaping 2. how to escape characters.
My naive expectation also that a URL object should not be reinterpreted when converted to a URLPattern
. So new URLPattern(new URL('http://foo?json={}'))
should NOT be equivalent to new URLPattern('http://foo?json=')
It does work to do new URLPattern("web\\+foo://*")
(i.e. escape +
in the "pattern string" language) or new URLPattern("(web\\+foo)://*")
(escape +
in a regexp), or new URLPattern("(web[+]foo)://*")
(use a character class in a regexp). Per this issue, I don't think this should need escaping.
I don't think we will ever be able to make all URLs valid URL patterns (or, if they're valid, have the same meaning), though we can make needing escaping a little less common. I agree that describing how to effectively escape (either by hand or programmatically) would be a useful addition (I've written such algorithms myself, and they are indeed not trivial).
I think it's probably possible to allow other-modifier tokens (+
and ?
) after some fixed text to get subsumed by it, since it otherwise has no existing syntactic meaning. This would make things like web+foo
viable without changing the meaning of :foo?
and similar. It's not completely trivial to make this change, though, so I need to actually try to make the change for that to work.
What is the issue with the URL Pattern Standard?
URLPattern doesn't tolerate
+
in protocol. The polyfill gives this error:This is especially a problem since the
web+
prefix is mandatory when registering schemes.