whatwg / urlpattern

URL Pattern Standard
https://urlpattern.spec.whatwg.org/
Other
157 stars 22 forks source link

Does not tolerate `+` in protocol #228

Open rotu opened 2 months ago

rotu commented 2 months ago

What is the issue with the URL Pattern Standard?

URLPattern doesn't tolerate + in protocol. The polyfill gives this error:

new URLPattern("web+foo://example.com/baz")

TypeError: Failed to construct 'URLPattern': Unexpected OTHER_MODIFIER at 3, expected END

This is especially a problem since the web+ prefix is mandatory when registering schemes.

sisidovski commented 2 months ago

The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there, but this is valid. To canonicalize a protocol, it uses the result of running basic URL parser. And in scheme state , a character should be an ASCII alphanumeric, U+002B (+), U+002D (-), or U+002E (.). So it should tolerate +, like git+https.

Actually chromium throws an error with the protocol string containing +, it seems to be a bug to be fixed.

jeremyroman commented 2 months ago

I haven't looked into this, but @sisidovski if you think this is just a bug in the Chromium implementation, could you file & link a Chromium bug?

crowlKats commented 2 months ago

for reference, this is also happening in Deno, so potentially more of a spec issue

sisidovski commented 2 months ago

@jeremyroman Filed https://crbug.com/357760925. I'm happy to work on it in spare time.

rotu commented 2 months ago

The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there

@sisidovski I'm not sure if you're saying there is no bug in the spec here.

I have a very hard time reading state-machine-oriented specs. What are the expected "token list" and "part list" and "protocol component" from Constructor string parsing given input "web+foo://example.com/baz"?

sisidovski commented 2 months ago

@rotu Thanks. I took a look again a bit more in detail, and probably I caught your point. The step 3.11 Run consume a required token given parser and "end" in parse-a-pattern-string will throw TypeError if web+foo is passed, because this algorithm doesn't handle +, which is treated as "other-modifier" token type in the token list, and obviously this is not the "end" token type.

rotu commented 2 months ago

@sisidovski I don't think I even understood my point when I wrote that.

jeremyroman commented 1 month ago

I don't think we will ever be able to make all URLs valid URL patterns (or, if they're valid, have the same meaning), though we can make needing escaping a little less common. I agree that describing how to effectively escape (either by hand or programmatically) would be a useful addition (I've written such algorithms myself, and they are indeed not trivial).

I think it's probably possible to allow other-modifier tokens (+ and ?) after some fixed text to get subsumed by it, since it otherwise has no existing syntactic meaning. This would make things like web+foo viable without changing the meaning of :foo? and similar. It's not completely trivial to make this change, though, so I need to actually try to make the change for that to work.