nodejs / node-v0.x-archive

Moved to https://github.com/nodejs/node
34.42k stars 7.31k forks source link

url: resolve strips drive letters from Windows file URLs #5452

Closed domenic closed 1 year ago

domenic commented 11 years ago
url.resolve('file:///C:/file.txt', '/');

Got: file:///

Expected: file:///C:/

awwright commented 11 years ago

This is the correct behavior, according to RFC 3986, the hier-part is ///C:/file.txt: authority is blank, and the path is /C:/file.txt. Therefore, resolving a URI-reference of / will result in file:/// (empty authority, and path of /)

bnoordhuis commented 11 years ago

What @Acubed said. It's the expected behavior.

domenic commented 11 years ago

It's correct according to the years-old RFC, but does not match real-world browser behavior nor the more recent URL standard: http://url.spec.whatwg.org/

isaacs commented 11 years ago

I'm with @domenic on this. Our goal with the url module is to follow browser behavior. The WhatWG has been kind enough to make a proper spec, which would have been nice if it'd been around 4 years ago. We should follow that spec, since it's what browsers actually do.

awwright commented 11 years ago

For URI resolution, the HTML 4.01 specification references RFC 2396, which was updated by RFC 3986. The HTML 5 candidate recommendation normatively references only the newer RFC 3986. While the HTML 5 draft does explicitly vary its resolution from RFC 3986, it is limited in scope and in is marked in its rationale for supporting older documents before RFC 3986 that would otherwise be illegal (not a problem for Node.js), and nowhere does it specially handle file URLs, in which drive letters are supposed to be considered a directory.

I already use URIs with colons and such characters, in a number of schemes including file with and without an authority, which is a feature heavily used for CURIE, among other uses. Any special behavior would break my application, and could pose security problems if, for instance, certain path segments could modify the authority or the resolved filesystem path. (HTTP/1.1 mandates that servers accept absolute forms of URIs, too, and any URI, not just URLs, this is likely to become the only method in which requests are made in HTTP/2.0).

The point is that the behavior of URIs are explicitly not supposed to change between applications or over time. They're, well, uniform.

domenic commented 11 years ago

The point is that the behavior of URIs are explicitly not supposed to change between applications or over time. They're, well, uniform.

Indeed, the web has not been following those RFCs for a very long time. Nothing has changed since the early days of web browsers. I've run tests so far in all web browsers plus .NET, and URL handling uniformly figures out Windows drive letter paths correctly. The RFCs are simply inaccurate.

domenic commented 11 years ago

If it helps, Node's url module already has many improvments over the outdated RFCs that help it match real-world URL resolution behavior. This bug and #5453 are the only remaining missing pieces! But if you check out https://github.com/tmpvar/jsdom/pull/550 you'll see many many other divergences, as we took a URL resolution algorithm designed from the RFC and turned it into one that matched browsers.

awwright commented 11 years ago

If the browsers are doing it differently, they're doing it wrong. In the HTML specification itself, RFC 3986 is the normative (authoritative) reference in how to resolve and parse URIs.

Observing that implementations have done it differently over time is only a reason to make sure that Node.js follows the definition of the URI and not add to the tangle of inoperable implementations.

domenic commented 11 years ago

Ah, good catch; I'll talk to the appropriate people and get the HTML5 spec updated. Thanks!

Edit: Looks like you were just wrong? After asking around #whatwg in IRC, looks like the HTML spec references the URL spec already:

http://www.whatwg.org/specs/web-apps/current-work/#url-parser

awwright commented 11 years ago

The HTML5 spec already refers to RFC 3986? Even if it did define incompatible behavior, being a normative specification means that it can't be changed, even if such specifications wanted to - the URI behavior takes precedence.

For reference, here's a (very incomplete) list of standards or proposed standards that reference RFC 3986, RFC 3987, or a compatible older specification:

Varying the behavior from RFC 3986 would break all of them.

domenic commented 11 years ago

@Acubed thanks for doing all that leg work! I've passed it on to the appropriate parties, and we'll see updates to those specs soon to reflect web reality.

awwright commented 11 years ago

@domenic thanks for your snarky nonsensical help, it really helps contributes to the advancement of the Web. Not.

I actually do have a direct line of communication to the authors of many of those standards, they've all told me so far it's nonsense.

Can we please get on with the reality of the Web now, thanks. RFC 3986 is the single authoritative specification. It is still in STANDARD status; it has not been superseded.

domenic commented 11 years ago

What you seem to be missing is that standards reflect web reality; they do not create it. Browsers and other software all implement URLs in a way that diverges significantly from those outdated RFCs; the existence of the URL spec came about because vendors realized this and sought to codify the new reality in a document that they could all refer to for an interoperable implementation of edge cases. It's great that you've found places where older documents don't reflect that, and we'll work toward fixing that. But the reality of the software we work in is different, and that's not going to change---breaking many programs that rely on real-world URL spec behavior---just because an older document says so.

awwright commented 11 years ago

I don't know where you get the impression they're outdated. RFC 2732 is "outdated", or to use the industry vocabulary, "obsolete", it is superseded by RFC 3986. I I just listed more than a dozen specifications that rely on an exact behavior of RFC 3986. Diverging from the behavior of the vast majority of specifications is what is out of touch.

Like I described, HTML5 does accommodate a superset of URIs for reverse comparability like you described, but it doesn't change its behavior.

annevk commented 11 years ago

There are some subtle differences actually. DOM has already changed: http://dom.spec.whatwg.org/ HTML has too: http://www.whatwg.org/specs/web-apps/current-work/multipage/ CSS will soon change too. Do not really know about the rest of the list @domenic mentioned.

jasnell commented 9 years ago

Given that there is a plan to update the url implementation to conform better with the updated specs, I'm going to mark this as defer-to-convergence.