Open achristensen07 opened 4 years ago
https://github.com/whatwg/url/issues/97#issuecomment-270461963 has some discussion on this. There's a concern around Windows drive letters there and your suggested patch would end up doing the wrong thing for those, so a solution here would likely be more involved. If we allowed this it would require some big changes to the API side as well (potentially simplifications though).
cc @alwinb @valenting @sleevi @jasnell @domenic
I have not looked into the port issue, but FWIW I'm pretty sure you can do away with the file host state in the standard fairly easily. Just parse them as special URLs and enforce that the authority does not have credentials later. Then you can handle whatever is decided about the port there too. You need to go back to using nulls for empty user/pass though, which I think I saw somewhere has changed before.
Right now the host parser (which is called from the file host state) raises an error on :
and @
, so it just changes where the error is raised.
I'm also curious why we prevent a username and password in a file URL when https://tools.ietf.org/html/rfc8089#appendix-E.1 clearly includes them. I can easily imagine a use case when the host requires credentials, but that would add another place where a colon could be in a file URL, although not in the path. Allowing a port would also allow a colon, but not in the path. The comment in issue #97 says "that would require a lot more monkey-ing about with the UNC & drive-letter sniffing logic." What would it take to do that "monkey-ing about" and make it only look in the path for drive letters? FWIW, as of this morning this is the last url-constructor.html web platform test that is failing in WebKit.
Allright, so since I commented, I decided I should have a look. I have removed the file host state in favour of the authority state in the reference implementation.
Changes:
Currently, there are still checks within the authority parser that return failure if a file URL has credentials or a port. This can be easily changed to match whatever is decided for this issue.
Actually, 1. may be done in a different way too. And as I write this, I wonder if file URLs should be allowed to have hosts that are converted to an empty string by domainToASCII.
@alwinb cool, that seems like an improvement. When would domainToASCII return the empty string?
Hmm one example is domainToASCII('\u00AD')
. This currently returns failure. And indeed new URL('file://\u00AD/')
throws in jsdom/whatwg-url, in Chrome (86/mac) and in Safari (14), but not in nodeJS nor in Firefox (81/mac).
Thanks, returning failure there seems preferable to me. Having multiple ways of writing an empty host seems somewhat dangerous, though I have a hard time coming up with a concrete attack.
I think I agree with that. I haven't looked into it, it just crossed my mind.
I wasn't sure if I should file a pull request, since we didn't discuss a course of action for this. I hope that's ok. It's just a small improvement (I think) that may help, but I don't have any strong opinions about this specific issue (of port/ and or auth in file URLs).
Thanks for looking into this, alwinb. I would be in favor of removing the file host state and allowing port and authority on file URLs.
I think the refactoring is fine to go ahead with. For allowing port/authority on file URLs I'd really like some input from @valenting @sleevi @jasnell @domenic.
I don't really have an opinion. It seems weird, but if @achristensen07 has found evidence of some software that depends on it, then maybe it's best to just allow it. file:
URLs are pretty messy.
It is true that most file URLs won't have it, and I'm not sure what anything would do with the port, but I don't see a good reason to forbid it and fail parsing.
I mean, this was forbidden in RFC 3986 (and it's predecessors), as captured in the history discussed in the update for RFC 1738, RFC 8089, Section 2
Allowing ports here would be a greater fork from the IETF, and it'd be great that we only do that if/when we have principled reason. Do we believe Adobe Lightroom is still affected, as mentioned in the WebKit change? We haven't seen any incompatibilities from Chrome
Given that Chrome has this behavior already and the link in question was intended to be opened in any browser, I've decided to take the ol' try-it-and-see-what-breaks approach. I'll let you know in a few months if it sticks.
Currently, with the test URL file://example:1/ we have three different behaviors:
While attempting to disallow ports from file URLs in https://bugs.webkit.org/show_bug.cgi?id=217252 I found a test written 13 years ago in https://trac.webkit.org/changeset/24484/webkit that indicates there is software that relies on the port not being forbidden in file URLs, and that there has been such software for a very long time.
I'm wondering if there is a good reason to not allow a port in file URLs? It is true that most file URLs won't have it, and I'm not sure what anything would do with the port, but I don't see a good reason to forbid it and fail parsing.
Would we be willing to add a transition from file host state to port state if c is ':'?