nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.82k stars 29.71k forks source link

Incorrect normalization of POSIX paths with exactly two leading slashes #51345

Open jp-diegidio opened 10 months ago

jp-diegidio commented 10 months ago

Version

v20.10.0

Platform

Microsoft Windows NT 10.0.19045.0 x64

Subsystem

lib/path.js ("node:path/posix")

What steps will reproduce the bug?

When I normalize a POSIX path with 'node:path/posix', if the path contains exactly two leading slashes, I'd expect the two leading slashes NOT to be collapsed to one, as per the POSIX spec (see [1]), but the opposite appears to happen.

Here is an example with CMD on Win 10:

C:\Users\Julio>node
Welcome to Node.js v20.10.0.      
Type ".help" for more information.
> var $p = require("path/posix")
undefined
> $p.normalize("//alpha/beta")
'/alpha/beta'

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior? Why is that the expected behavior?

Expected result is '//alpha/beta', i.e. the two leading slashes are NOT collapsed to one.

What do you see instead?

Actual result is '/alpha/beta', where the two leading slashes have been collapsed to one.

Additional information

[1] See POSIX.1-2017 / Base Definitions / 3.271 Pathname, which states:

Multiple successive slash characters are considered to be the same as one slash, except for the case of exactly two leading slash characters.

tniessen commented 10 months ago

The same specification says in Section 4.13:

If a pathname begins with two successive characters, the first component following the leading characters may be interpreted in an implementation-defined manner, although more than two leading characters shall be treated as a single character.

However, to the best of my knowledge, only very few POSIX-like platforms take advantage of this ancient edge case (Cygwin, IBM z/OS, ...). There are also some applications that treat such paths differently than the underlying operating system (e.g., Blender).

jp-diegidio commented 10 months ago

However, to the best of my knowledge, only very few POSIX-like platforms take advantage of this ancient edge case

The case is not ancient, and it is especially relevant to Windows enterprise-level development, where these POSIX paths represent UNC paths.

The case is indeed supported correctly in Cygwin and similar tools, while it is true that tools under Unix for the most part ignore it (notably, with the exception of bash): which is also partly due to the fact that initial versions of the POSIX spec were quite cryptic on that specific point. [P.S. Which is also the reason why lots of references online are to "4.13 Pathname Resolution", but conformance to "3.271 Pathname", i.e. to the very format, is what our issue is about.]

But now the spec is clear as well as the use case, and I'd propose that either 1) we fix normalization in 'path/posix', or 2) a note is added to the docs to say that the implementation is conformant with POSIX except on that point. -- Where the latter, IMO, would only make sense for backward compatibility, otherwise the fix itself is rather simple.

As to the need to take some action here, please consider this scenario: I am a developer, my app is required to support only POSIX paths (in config files and internally), and now I need either a link to an issue here or a link to the docs to tell my customer that, at least for the time being, the system has that specific limitation (and that they'll have to either map UNC paths to local drives to make it work, or add coming up with a conformant implementation to the backlog...).

tniessen commented 10 months ago

The case is not ancient, and it is especially relevant to Windows enterprise-level development, where these POSIX paths represent UNC paths.

The case is indeed supported correctly in Cygwin

My bad, I didn't realize Cygwin was still present in production environments - as far as I can tell, its popularity has been decreasing rapidly for many years. On top of that, Windows UNC paths have sadly always been a mess...

Anyway, I guess not changing // to / is reasonable. Not everyone is going to be happy with that, but I suppose we can point them to the POSIX spec to justify this edge case.

WDYT @nodejs/path @nodejs/platform-zos?

jp-diegidio commented 10 months ago

This issue does not need Cygwin or else to occur, and I have even offered a concrete scenario. And have you read that UNC is deprecated or obsolete? A mess is non-conformance!

Then I must stress again that the issue here is conformance to the POSIX *path format*, not conformance of some tools / conformance of path resolution -- not even of 'node:path/posix' at that, at least not insofar as I suppose once normalize is fixed, the rest simply works, though this I'd double-check as part of any fix to normalize.

WDYT @nodejs/path @nodejs/platform-zos?

I don't think I/we need anything more to support the present case, but if you still think those two references are relevant, please at least provide links and a hint as to why.

tniessen commented 10 months ago

This issue does not need Cygwin or else to occur, and I have even offered a concrete scenario. And have you read that UNC is deprecated or obsolete? A mess is non-conformance!

I never said that the issue is limited to Cygwin, nor that UNC is deprecated or obsolete. I even agreed that changing the behavior of Node.js appears to be in line with the POSIX standard.

(However, given that Node.js has likely exhibited this behavior for a long, long time and I don't recall similar bug reports in recent years, I'd assume that this is a rather rare issue on modern systems — which doesn't mean that it shouldn't be addressed, but it probably makes it low-priority for most folks.)

WDYT @nodejs/path @nodejs/platform-zos?

I don't think I/we need anything more to support the present case, but if you still think those two references are relevant, please at least provide links and a hint as to why.

I notified relevant teams that might have additional insights or potentially even an interest in implementing the suggested change, but since you seem displeased with that, please feel free to open a pull request yourself.

jp-diegidio commented 10 months ago

You seem displeased, I am fine with that.

aduh95 commented 10 months ago

I'd propose that either 1) we fix normalization in 'path/posix', or 2) a note is added to the docs to say that the implementation is conformant with POSIX except on that point

IMO it makes sense to do both, but in the other order: first a PR to document the limitation that can be backported to current release lines, and after that a second one that makes the implementation compliant with the spec can land as https://github.com/nodejs/node/labels/semver-major

PRs welcome!

jp-diegidio commented 10 months ago

@aduh95

IMO it makes sense to do both, but in the other order

Given I am not particularly familiar with node's history and statistics, the only thing I am still a bit worried about is backward compatibility, but if you guys think that is not an issue, I'd definitely second the approach you have described.

PRs welcome!

At the moment I happen not to have much time, but if this is still open and unassigned by the end of the month, I will gladly try and give a hand...

tniessen commented 10 months ago

the only thing I am still a bit worried about is backward compatibility, but if you guys think that is not an issue, I'd definitely second the approach you have described.

The semver-major label that @aduh95 mentioned allows breaking changes in major releases, which wouldn't otherwise be allowed in regular releases. I assume that the vast majority of applications never construct POSIX paths beginning with //, so breakage shouldn't be too bad in any case.

jp-diegidio commented 10 months ago

Hi guys, I am having second thoughts on this fix, as meanwhile I am finding more non-conformities:

And I'd think the backward-compatility impact becomes more and more significant, also considering that:

In light of the above, thinking overall a PR and how extensive the changes seem to be, and mainly for concerns of backward compatibility, I'm rather thinking of the following approach:

1) develop an independent ("custom") 'path-conformant' module, that fully conforms to POSIX as well as handles Windows paths correctly, to be used in substitution of 'node:path', just with the same interface (though we might be able to make the data types more precise);

2) let the 'path-conformant' module stabilize;

3) consider some back-porting (where the "conformant vs legacy" versions might even live along side, at least for a while).

I am in fact already getting into doing 1) (and I'd make sure I share it under a suitable license): anyway it starts with formalizing the specification, which is an initial step I'd think is needed in any case. But it might take a while...

Your thoughts?

aduh95 commented 10 months ago

Each changes would need to be discussed separately IMO; if there are more bugs, they can fixed later or documented as known deviations from the spec. if you send a PR to fix the double slash bug, it will likely get accepted. If you prefer to invest your time if a rewrite, that's fine too, but consider it's less likely to get accepted, because as you said the backward-compat will be a concern. My advice is to start small and make small incremental changes. Not only it's easier for reviewers to have small PRs, but also more rewarding for you as you can see your first PRs being merged while you work on the next ones – on the other hand, working on a very large PR for months, with little feedback to know if you're in the right direction does not sound as fun.

jp-diegidio commented 10 months ago

Honestly, I'd advice against that approach, in this specific case (I think it makes things more difficult, not less, especially transition-wise), as well as more generally (see this post of mine for some reasons): but I understand this is not the place to discuss methodological issues, nor I'd actually have any problem with a piece-meal approach on your side.

In any case, I'd start by writing down a formal specification of the protocol we are implementing, in the simplest scenario to be kept as a comment in the code.

Also, to be clear and upfront, my personal task is to have a "conformant path module", not just a single issue fixed, moreover, once I do it, I must do it in the context of "verified software" (with Coq specifically): so my personal plan is to start by formalizing the spec to then, eventually, generate from it the JavaScript...

Whether any of that might be of interest to the 'node' project is not for me to say (though I'd hope so, at least in the long term: that some components come from a "verified pipeline"),

In any case, I will be sharing my work to the extent possible: e.g. I will share here (if the case is still open) the formal specification of the protocol in question (which, to my present understanding, is "POSIX Pathnames plus support for Windows paths").

tniessen commented 10 months ago

I think normalize() makes very different assumptions than the POSIX spec w.r.t. what constitutes a valid normalization. It's been around since 2009 or so, so I assume the vast majority of users are satisfied with the current behavior and might even depend on it. That doesn't mean it's correct (or good), of course, but it means that any significant change in behavior might incur a net negative impact on our users. We have known about some related issues that arise from this kind of incorrect normalization for a while, see, for example, https://github.com/nodejs/node/pull/49155.

For now, I think we should document that the normalize() functions and any functions that rely on it do not strictly adhere to the POSIX spec in terms of normalization.