Does not tolerate `+` in protocol

rotu commented 3 months ago

What is the issue with the URL Pattern Standard?

URLPattern doesn't tolerate + in protocol. The polyfill gives this error:

new URLPattern("web+foo://example.com/baz")

TypeError: Failed to construct 'URLPattern': Unexpected OTHER_MODIFIER at 3, expected END

This is especially a problem since the web+ prefix is mandatory when registering schemes.

sisidovski commented 3 months ago

The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there, but this is valid. To canonicalize a protocol, it uses the result of running basic URL parser. And in scheme state , a character should be an ASCII alphanumeric, U+002B (+), U+002D (-), or U+002E (.). So it should tolerate +, like git+https.

Actually chromium throws an error with the protocol string containing +, it seems to be a bug to be fixed.

jeremyroman commented 3 months ago

I haven't looked into this, but @sisidovski if you think this is just a bug in the Chromium implementation, could you file & link a Chromium bug?

crowlKats commented 3 months ago

for reference, this is also happening in Deno, so potentially more of a spec issue

sisidovski commented 3 months ago

@jeremyroman Filed https://crbug.com/357760925. I'm happy to work on it in spare time.

rotu commented 3 months ago

The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there

@sisidovski I'm not sure if you're saying there is no bug in the spec here.

I have a very hard time reading state-machine-oriented specs. What are the expected "token list" and "part list" and "protocol component" from Constructor string parsing given input "web+foo://example.com/baz"?

sisidovski commented 3 months ago

@rotu Thanks. I took a look again a bit more in detail, and probably I caught your point. The step 3.11 Run consume a required token given parser and "end" in parse-a-pattern-string will throw TypeError if web+foo is passed, because this algorithm doesn't handle +, which is treated as "other-modifier" token type in the token list, and obviously this is not the "end" token type.

rotu commented 3 months ago

@sisidovski I don't think I even understood my point when I wrote that.

As things stands, it's not even clear what the constructor string can and should look like! The pattern string section could definitely use some examples! The URL spec has many expository examples, which make it more approachable. (It's confusing to me that this spec supports two pattern matching languages, both path-to-regexp-like patterns and regexp patterns. If I had my druthers, I'd probably ditch the new pattern syntax in favor of only regexp, but I doubt you share my appetite for that change!)
My naive expectation is that a URL string should also be a valid URL pattern. It a source of future confusion that this spec interprets legal URL characters non-literally in its pattern syntax. For instance, (, {, :, \\ are valid in the query string but would need to be escaped in a URLPattern constructor string. This deserves a prominent note explaining 1. what needs escaping 2. how to escape characters.
My naive expectation also that a URL object should not be reinterpreted when converted to a URLPattern. So new URLPattern(new URL('http://foo?json={}')) should NOT be equivalent to new URLPattern('http://foo?json=')
It does work to do new URLPattern("web\\+foo://*") (i.e. escape + in the "pattern string" language) or new URLPattern("(web\\+foo)://*") (escape + in a regexp), or new URLPattern("(web[+]foo)://*") (use a character class in a regexp). Per this issue, I don't think this should need escaping.

jeremyroman commented 2 months ago

I don't think we will ever be able to make all URLs valid URL patterns (or, if they're valid, have the same meaning), though we can make needing escaping a little less common. I agree that describing how to effectively escape (either by hand or programmatically) would be a useful addition (I've written such algorithms myself, and they are indeed not trivial).

I think it's probably possible to allow other-modifier tokens (+ and ?) after some fixed text to get subsumed by it, since it otherwise has no existing syntactic meaning. This would make things like web+foo viable without changing the meaning of :foo? and similar. It's not completely trivial to make this change, though, so I need to actually try to make the change for that to work.

whatwg / urlpattern

Does not tolerate `+` in protocol #228

What is the issue with the URL Pattern Standard?