whatwg / url

URL Standard
https://url.spec.whatwg.org/
Other
533 stars 139 forks source link

Allow ports in file URLs #548

Open achristensen07 opened 4 years ago

achristensen07 commented 4 years ago

Currently, with the test URL file://example:1/ we have three different behaviors:

  1. Firefox will remove the host and return file:///
  2. Chrome does not allow the port and throws an exception. This is currently the behavior described in the specification.
  3. Safari allows the port.

While attempting to disallow ports from file URLs in https://bugs.webkit.org/show_bug.cgi?id=217252 I found a test written 13 years ago in https://trac.webkit.org/changeset/24484/webkit that indicates there is software that relies on the port not being forbidden in file URLs, and that there has been such software for a very long time.

I'm wondering if there is a good reason to not allow a port in file URLs? It is true that most file URLs won't have it, and I'm not sure what anything would do with the port, but I don't see a good reason to forbid it and fail parsing.

Would we be willing to add a transition from file host state to port state if c is ':'?

annevk commented 4 years ago

https://github.com/whatwg/url/issues/97#issuecomment-270461963 has some discussion on this. There's a concern around Windows drive letters there and your suggested patch would end up doing the wrong thing for those, so a solution here would likely be more involved. If we allowed this it would require some big changes to the API side as well (potentially simplifications though).

cc @alwinb @valenting @sleevi @jasnell @domenic

alwinb commented 4 years ago

I have not looked into the port issue, but FWIW I'm pretty sure you can do away with the file host state in the standard fairly easily. Just parse them as special URLs and enforce that the authority does not have credentials later. Then you can handle whatever is decided about the port there too. You need to go back to using nulls for empty user/pass though, which I think I saw somewhere has changed before.

Right now the host parser (which is called from the file host state) raises an error on : and @, so it just changes where the error is raised.

achristensen07 commented 4 years ago

I'm also curious why we prevent a username and password in a file URL when https://tools.ietf.org/html/rfc8089#appendix-E.1 clearly includes them. I can easily imagine a use case when the host requires credentials, but that would add another place where a colon could be in a file URL, although not in the path. Allowing a port would also allow a colon, but not in the path. The comment in issue #97 says "that would require a lot more monkey-ing about with the UNC & drive-letter sniffing logic." What would it take to do that "monkey-ing about" and make it only look in the path for drive letters? FWIW, as of this morning this is the last url-constructor.html web platform test that is failing in WebKit.

alwinb commented 4 years ago

Allright, so since I commented, I decided I should have a look. I have removed the file host state in favour of the authority state in the reference implementation.

Changes:

  1. The host parser can now return an empty host.
  2. The file host state has been removed.
  3. The file authority is parsed using the generic authority parser.

Currently, there are still checks within the authority parser that return failure if a file URL has credentials or a port. This can be easily changed to match whatever is decided for this issue.

Actually, 1. may be done in a different way too. And as I write this, I wonder if file URLs should be allowed to have hosts that are converted to an empty string by domainToASCII.

annevk commented 4 years ago

@alwinb cool, that seems like an improvement. When would domainToASCII return the empty string?

alwinb commented 4 years ago

Hmm one example is domainToASCII('\u00AD'). This currently returns failure. And indeed new URL('file://\u00AD/') throws in jsdom/whatwg-url, in Chrome (86/mac) and in Safari (14), but not in nodeJS nor in Firefox (81/mac).

annevk commented 4 years ago

Thanks, returning failure there seems preferable to me. Having multiple ways of writing an empty host seems somewhat dangerous, though I have a hard time coming up with a concrete attack.

alwinb commented 4 years ago

I think I agree with that. I haven't looked into it, it just crossed my mind.

alwinb commented 4 years ago

I wasn't sure if I should file a pull request, since we didn't discuss a course of action for this. I hope that's ok. It's just a small improvement (I think) that may help, but I don't have any strong opinions about this specific issue (of port/ and or auth in file URLs).

achristensen07 commented 4 years ago

Thanks for looking into this, alwinb. I would be in favor of removing the file host state and allowing port and authority on file URLs.

annevk commented 4 years ago

I think the refactoring is fine to go ahead with. For allowing port/authority on file URLs I'd really like some input from @valenting @sleevi @jasnell @domenic.

domenic commented 4 years ago

I don't really have an opinion. It seems weird, but if @achristensen07 has found evidence of some software that depends on it, then maybe it's best to just allow it. file: URLs are pretty messy.

sleevi commented 4 years ago

It is true that most file URLs won't have it, and I'm not sure what anything would do with the port, but I don't see a good reason to forbid it and fail parsing.

I mean, this was forbidden in RFC 3986 (and it's predecessors), as captured in the history discussed in the update for RFC 1738, RFC 8089, Section 2

Allowing ports here would be a greater fork from the IETF, and it'd be great that we only do that if/when we have principled reason. Do we believe Adobe Lightroom is still affected, as mentioned in the WebKit change? We haven't seen any incompatibilities from Chrome

achristensen07 commented 4 years ago

Given that Chrome has this behavior already and the link in question was intended to be opened in any browser, I've decided to take the ol' try-it-and-see-what-breaks approach. I'll let you know in a few months if it sticks.