nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
108.04k stars 29.82k forks source link

Node 6 fs.realpath behavior changes #7175

Closed isaacs closed 8 years ago

isaacs commented 8 years ago

Previously, looping symbolic links and long path names were handled by fs.realpath without issue. The changelog says that fs.realpath "can throw new errors", but fails to identify that it also is less powerful (and also does not identify what those new errors are).

Seems like it'd be possible to maintain backwards compatibility if Node could fall back to the slower but more resilient implementation when ELOOP or ENAMETOOLONG are raised.

If this is not desirable, then the changelog should be updated to reflect the reduction in functionality.

I have no strong opinion about whether or not Node is capable of resolving the realpath of long looping symbolic links, but I would very much like a thing to point to when people complain about glob being broken. Thanks.

$ node -v
v5.3.0

$ cat rp.js
var fs = require('fs')

var l = 128

var long = 'rptest/' + Array(l).join('a/b/') + 'a'

clean()
process.on('exit', clean)
function clean () {
  try { fs.unlinkSync('rptest/a/b') } catch (e) {}
  try { fs.rmdirSync('rptest/a') } catch (e) {}
  try { fs.rmdirSync('rptest') } catch (e) {}
}

fs.mkdirSync('rptest')
fs.mkdirSync('rptest/a')
fs.symlinkSync('..', 'rptest/a/b')

console.log(fs.realpathSync(long))
fs.realpath(long, console.log)

$ node rp.js
/home/isaacs/rptest/a
null '/home/isaacs/rptest/a'

$ nave use 6

$ node rp.js
fs.js:1568
  return binding.realpath(pathModule._makeLong(path), options.encoding);
                 ^

Error: ELOOP: too many symbolic links encountered, realpath 'rptest/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a/b/a'
    at Error (native)
    at Object.realpathSync (fs.js:1568:18)
    at Object.<anonymous> (/home/isaacs/rp.js:19:16)
    at Module._compile (module.js:541:32)
    at Object.Module._extensions..js (module.js:550:10)
    at Module.load (module.js:458:32)
    at tryModuleLoad (module.js:417:12)
    at Function.Module._load (module.js:409:3)
    at Function.Module.runMain (module.js:575:10)
    at startup (node.js:160:18)
ChALkeR commented 8 years ago

@isaacs Is this related to https://github.com/npm/npm/issues/5082 fix being blocked?

jhamhader commented 8 years ago

Introduced in #3594 By the way, what happens when you try to fs.stat() on the link?

addaleax commented 8 years ago

6861 and #7044 are also issues that stem from the new realpath implementation.

Falling back to a JS-based realpath implementation sounds like a good idea to me.

Fishrock123 commented 8 years ago

@TheAlphaNerd citgm tests glob, right? @isaacs Did glob have tests for it previously?

jasnell commented 8 years ago

@Fishrock123 ... the primary issue with glob is two fold: (a) fs.realpath now throws errors when it previously did not and (b) when those errors are thrown is platform specific.

Previously, if I use glob with realpath on a circular symlink, it would recurse until a name too long error would occur, which is being caught and intentionally used by glob. This error is essentially the signal that glob should stop recursing and just returns the result up to that point. On certain operating systems (e.g. OSX), the libuv realpath implementation now throws ELOOP errors before the other error is thrown. Since glob does not ignore the ELOOP errors, the app dies. On Linux, the eloop and name too long errors throw at the same depth so the problem is not encountered because the ignored error comes first and the loop inside glob exits before the eloop can occur. On OSX, however, the eloop happens first.

The glob code was written to assume that fs.realpath would never throw an errors of it's own (there's no defensive error handling logic around the call to realpath).

Fishrock123 commented 8 years ago

The glob code was written to assume that fs.realpath would never throw an errors of it's own (there's no defensive error handling logic around the call to realpath).

I think glob's assumptions here are ok in terms of errors, fwiw.

Edit: As long as you don't pass it obviously invalid parameters

Fishrock123 commented 8 years ago

As a note, the realpath change is listed on the breaking change doc: https://github.com/nodejs/node/wiki/Breaking-changes-between-v5-and-v6#fs

However it does not note this specifically, and it does not mean that there isn't still a problem or bug here.

jasnell commented 8 years ago

Note, I'm not saying that glob's assumptions around error handling are right or wrong, just describing the situation. Previously fs.realpath did not throw, and now it does throw based on platform specific differences.

jhamhader commented 8 years ago

It is important to note that other fs functions also have this issue. I verified that about the following:

There is almost no fs function which is not affected.

isaacs commented 8 years ago

@Fishrock123 Yes, glob's tests fail on Node 6. There's a PR from @TheAlphaNerd which makes the test pass, but there are problems with the fix. It can't really be hacked around in node-glob without a very subtle and significant breaking change in node-glob that'll require people to update their code.

Imo, citgm should have prevented this change from happening in the first place. It is a bug in the process that it got out into a node release. That underlying process bug should be fixed, along with this technical bug.

@jhamhader Prior to node 6, one could use fs.realpath to avoid ELOOP errors in other fs operations, precisely because it would walk up the path from left to right resolving symbolic links.

isaacs commented 8 years ago

See: https://github.com/isaacs/node-glob/pull/259#issuecomment-223466218

saghul commented 8 years ago

Prior to node 6, one could use fs.realpath to avoid ELOOP errors in other fs operations, precisely because it would walk up the path from left to right resolving symbolic links.

Was this behavior documented? Generally speaking, all fs APIs are thing wrappers around their POSIX counterparts, and very thin wrappers at that. IMHO the general expectation is that they work just the man page says. In this line, I consider the current implementation works as expected.

FTR, the work done here was to make fs.realpath faster, and keeping the JS fallback was met with resistance back then. We waited for Windows XP support to go away in order to land it, because uv_fs_realpath doesn't work there.

The way I see it the only feasible solution here is to catch the error at the application level and handle it.

If this is not desirable, then the changelog should be updated to reflect the reduction in functionality.

I have no strong opinion about whether or not Node is capable of resolving the realpath of long looping symbolic links, but I would very much like a thing to point to when people complain about glob being broken. Thanks.

I understand that a documentation change is acceptable to you then?

ChALkeR commented 8 years ago

@isaacs

It can't really be hacked around in node-glob without a very subtle and significant breaking change in node-glob that'll require people to update their code.

Does this mean that other fs functions have also been affected, not only realpath()? Afaik only module uses fs.realpath in Node.js. What prevents from re-implementing the same functionality?

Prior to node 6, one could use fs.realpath to avoid ELOOP errors in other fs operations, precisely because it would walk up the path from left to right resolving symbolic links.

Perhaps what we need here need a separate method for just that, either in fs or in userland?

isaacs commented 8 years ago

I understand that a documentation change is acceptable to you then?

Adequately documenting breaking changes in a platform is the bare minimum requirement. Saying that something "raises new errors", without specifying what those errors are, when they are raised, or that they are platform-specific, is inadequate.

It is not a "fix" though, and I'll be disappointed if that's the resolution.

What prevents from re-implementing the same functionality?

Time and typing.

It looks like I'm going to have to fall back to the pre-node6 fs.realpath implementation anyway, I'd just prefer that it be in the platform rather than in my userland code, because I care a great deal about the stability of the Node.js platform and the community built on top of it. Stability is the point of a platform; it's what a platform is for, and breaking stability imposes a high cost. Stuff that was built on that platform falls over.

Do not be flippant about that cost. Every time something like this comes up, it imposes many person-hours of work, and reduces confidence in Node.js. Ignoring that cost is user-hostile, especially if the justification is "Well, it was wrong of the user to rely on this thing not breaking anyway". I'm not saying that breaking changes are never justified, but they are expensive. As a platform, Node has a responsibility to make sure the benefit is valuable enough to be worth it.

What is the point of making fs.realpath faster if most people don't use it directly, and modules that expose a realpath interface now all have to port the old (slower) implementation forward into their code? It seems like misguided priorities to me. If falling back to a JS implementation to preserve stability was discussed and rejected, then I'd argue that the discussion was with the wrong people or in the wrong context or based on the wrong set of values, because it is the wrong decision.

The net impact of stuff like this is that module authors feel the need to scramble and do a bunch of busy-work to catch up to the ways that core has broken their programs. It causes resentment and reduces trust. I'm guessing that this is not the intent of the TSC or CTC, based on what I've been told every time something like this comes up and I complain about it. Citgm is a good step in the right direction. But there needs to be a clearer understanding of trade-offs much earlier in the design process. I don't see that happening. I see my code breaking, and then post-hoc justifications about why it was the right thing to do, based on values that are not clearly articulated and not shared by me or my users.

saghul commented 8 years ago

I get the impression that you think this was done somewhat deliberately. It was not. The sequence of events was more like: "hey! let's make realpath faster!" "cool, are you up for a libuv patch?" "sure! here you go! and bonus nachos, the Node part as well!" "yikes, that won't work on Windows XP, let's wait until we drop it" "okidoki"

The fact that OSX behaves differently with regards to ELOOP was not observed back then, AFAIK. The discussion about keeping the current fs.realpath was not in this context but in the context of supporting Windows XP. The consensus was that having 2 implementations was not a good thing so we waited until Node 6, when XP support was dropped and then the whole thing landed, same code for all platforms.

Turns out CITGM was not in full swing (IIRC) so it didn't catch the node-glob problem, so here we are.

Now, solutions. From your example above it looks like returning the unchanged path if realpath fails is what you need? Or is there any other case in which it used to behave different?

isaacs commented 8 years ago

I don't think it was done deliberately. I think it wasn't deliberately avoided, and so far I've seen a lot of resistance to what seems like a pretty straightforward and compelling argument to revisit the "two implementations" approach. I also think that the process needs to treat this kind of breakage as a mistake, and correct to avoid the same mistake in the future, rather than treating it as expected and acceptable.

I don't need it to return the unchanged path, I need it to return the real path successfully, like it used to. See the test case in the OP in this thread. Falling back to the JS impl is a pretty obvious solution.

This issue was raised with node-glob a week prior to node 6 being released, but node 6 was released anyway. https://github.com/isaacs/node-glob/issues/258

Fwiw, this exists now, so whatever https://www.npmjs.com/package/fs.realpath

evanlucas commented 8 years ago

I think the fact that we broke a module that got 1.5 million downloads yesterday is a huge problem. Is a revert off the table at this point?

saghul commented 8 years ago

Not my call, but here is my 2 cents: It would probably be a good idea to keep using uv_fs_realpath for require(). Also with the change the cache is gone, FWIW.

Fishrock123 commented 8 years ago

What's keeping us from falling back to a JS impl for this one case if realpath does not work?

piranna commented 8 years ago

I have upgrade NodeOS from Node.js 4.x to 6.x and now I'm getting the next (probably related) error:

fs.js:1568
  return binding.realpath(pathModule._makeLong(path), options.encoding);
                 ^

Error: ENOENT: no such file or directory, realpath '/usr/bin/env'
    at Error (native)
    at Object.realpathSync (fs.js:1568:18)
    at Function.Module._findPath (module.js:167:25)
    at Function.Module._resolveFilename (module.js:438:25)
    at Function.Module._load (module.js:388:25)
    at Module.runMain (module.js:575:10)
    at run (node.js:348:7)
    at startup (node.js:140:9)
    at node.js:463:3

When hardcoding the shebang to /bin/node (the location of Node.js binary in NodeOS) the error I get is:

fs.js:1568
  return binding.realpath(pathModule._makeLong(path), options.encoding);
                 ^

Error: ENOENT: no such file or directory, realpath '/sbin/init'
    at Error (native)
    at Object.realpathSync (fs.js:1568:18)
    at Function.Module._findPath (module.js:167:25)
    at Function.Module._resolveFilename (module.js:438:25)
    at Function.Module._load (module.js:388:25)
    at Module.runMain (module.js:575:10)
    at run (node.js:348:7)
    at startup (node.js:140:9)
    at node.js:463:3

Seems realpath in Node.js v6.x is not resolving correctly the symlinks... :-/

saghul commented 8 years ago

@piranna Is /usr/bin/env a binary? Is /bin/node a symlink to /sbin/init? Do you have rights to /sbin?

piranna commented 8 years ago

@piranna Is /usr/bin/env a binary? Is /bin/node a symlink to /sbin/init? Do you have rights to /sbin?

On NodeOS, /usr/bin/env is a symlink to /bin/env, and /sbin/init is a symlink to /bin/nodeos-mount-filesystems. In any case, I've move back to Node.js 4.4.5 and it's working again, so I suposse the problem with 6.2.2 is related to this regression...

saghul commented 8 years ago

Hum. Can you strace that? fs.realpath uses realpath(3), which does work with symlinks, so this shouldn't happen.

piranna commented 8 years ago

How can I be able to do that? El 20/6/2016 1:42 AM, "Saúl Ibarra Corretgé" notifications@github.com escribió:

Hum. Can you strace that? fs.realpath uses realpath(3), which does work with symlinks, so this shouldn't happen.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/7175#issuecomment-227027513, or mute the thread https://github.com/notifications/unsubscribe/AAgfvuLDbBkOIAPh88jODcQbOBkvWgYkks5qNdPngaJpZM4IupSA .

saghul commented 8 years ago

@piranna strace -f -y the_program. Also, what libc are you using in NodeOS?

saghul commented 8 years ago

General note: I checked the OSX libc and it bails out with ELOOP if it finds more than MAXSYMLINKS symlinks. This value is alas defined as 32 in a kernel header, so we cannot change it even at compile time.

Glibc OTOH will return a minimum of 40.

Now, question: given that the entire OSX has this limit, have we run into it IRL or just in synthetic benchmarks?

piranna commented 8 years ago

@piranna strace -f -y the_program.

I'll try to check it on the desktop (NodeOS don't have strace...).

Also, what libc are you using in NodeOS?

musl

marvinhagemeister commented 8 years ago

@saghul I'm currently running into this issue in a project at work. Mocha's watched (based on chokidar) scans the whole project tree which contains a few circular symlinks in our case and causes the watcher to bail out.

saghul commented 8 years ago

@marvinhagemeister The ELOOP issue?

saghul commented 8 years ago

I did some more digging and there is something we can try: bypass the system realpath(3) and roll our own based on some libc implementation, I'm going to give this a try by using the one in musl, which is nice and short.

saghul commented 8 years ago

@piranna is /proc mounted when you call this? The realpath(3) implementation in musl requires this to work.

piranna commented 8 years ago

@piranna is /proc mounted when you call this? The realpath(3) implementation in musl requires this to work.

Good question ;-) No, it isn't, it's one of the filesystems that gets mounted by the nodeos-mount-filesystems module. The only filesystem that it's mounted when I get the error beyond the initram is /dev because it's needed since Node.js v0.11.15,

Does the change of this issue being related to usage of the realpath() function of the system libc? And could you point me where it's needing of /proc and why? I'm curious to know what happened here... :-)

saghul commented 8 years ago

@piranna Ok, so now we know what your issue is :-) Node 6 uses libuv 1.9.0, which exposes uv_fs_realpath, a thin wrapper for realpath(3). This is now used for fs.realpath in Node. musl requires that /proc is mounted for running realpath: see this and this and then this for some of the reasoning.

Bottom line: you're running into a chicken and egg problem: you are using a Node script to mount /proc, but Node is using musl which needs /proc to be there for most of what it does. It worked in Node < 6 out of luck, because fs.realpath wasn't implemented on top of realpath(3).

piranna commented 8 years ago

This is now used for fs.realpath in Node. musl requires that /proc is mounted for running realpath: see this and this and then this for some of the reasoning.

Totally surreal why /proc would be needed... O_O Seems similar to me when b8 started to need /dev on Node.js v0.11.15, a big wtf.

Bottom line: you're running into a chicken and egg problem: you are using a Node script to mount /proc, but Node is using musl which needs /proc to be there for most of what it does. It worked in Node < 6 out of luck, because fs.realpath wasn't implemented on top of realpath(3).

So here I have two options, make nodeos-init not so minimal and loose a bit of focus to not only mount /dev but also mount /proc, or wait to see if a solution happens on this issue (like your suggestion of have our implementation of realpath()). Ok, pleas don't consider my issue as a problem anymore, but as another showcase of this one :-)

saghul commented 8 years ago

@piranna If you look at it, it's a neat trick to get the realpath really quick:

Look inside any other libc for a 10x more complex alternative.

So here I have two options, make nodeos-init not so minimal and loose a bit of focus to not only mount /dev but also mount /proc, or wait to see if a solution happens on this issue (like your suggestion of have our implementation of realpath()). Ok, pleas don't consider my issue as a problem anymore, but as another showcase of this one :-)

My idea was to actually do what musl does, so to require /proc, which we already do for other stuff anyway. However, once I started to use my brain I realized that's not going to fly on OSX or other BSDs (no /proc there!) so I'm trying a different approach using fcntl with F_GETPATH only there (Linux would remain the same).

IMHO you should give up on trying not to mount /proc if you plan on using musl, it's a hard requirement as you've seen in the mail I linked, which is really recent.

saghul commented 8 years ago

I tried to avoid using realpath(3) on OSX by opening a fd to the path and using fcntl(fd, F_GETPATH, buf) but then open fails :-( Looks like that low limit is really everywhere.

I'm officially Out Of New Ideas (TM).

jhamhader commented 8 years ago

@saghul looking at the third link you gave (http://www.openwall.com/lists/musl/2016/06/07/4), seems like other functions:

which libuv uses for:

are used in their node counterparts. fstat() is also used directly in src/node.cc: PlatformInit(). Does it mean that all of the above are affected by musl systems with no /proc ?

saghul commented 8 years ago

@jhamhader Yep.

jasnell commented 8 years ago

I'm putting this one on the ctc-agenda for this week to discuss further. We need to make a decision on this. As I see it, there are a couple of paths forward:

  1. We can revert the changes that were made to fs.realpath() and introduce the new impl as a separate method (e.g. fs.nativerealpath() or something similar)
  2. We can add an option to fs.realpath() such that it would use the old impl instead of the new. This would be a bit difficult given the removal of the cache but it it's doable, for instance, using a Symbol option, e.g. var cache = {fs.legacyRealPath: true}; fs.realpath('...', cache). The fact that the option key is a symbol would prevent the possibility of a collision when using the options object as the cache for the legacy implementation.
  3. We can move the old implementation into a standalone user land module and point developers who need to rely on that behavior to there (as I understand it, this has already been done). This would mean doing nothing further in core.

There are likely other paths forward but we need to come to a decision.

marvinhagemeister commented 8 years ago

@saghul yep, forgot to mention that it's the ELOOP issue I was referring to.

addaleax commented 8 years ago

@jasnell Regarding option 3, keep in mind that that there also have been regressions in the module loader on Windows because of the realpath changes (e.g. the #6861 #7044 I linked above edit: #7192 also probably), so I’d be careful when considering “doing nothing further in core”.

My personal favourite would still be this one.

MylesBorins commented 8 years ago

@addaleax thinking of an implementation... would it be reasonable to wrap the call to the new implementation in a try catch and run the legacy version if an ELOOP error is caught?

jasnell commented 8 years ago

The only problem I have with falling back is the potential performance hit... primarily given the issue with the cache argument. With the old implementation, the cache is necessary to improve performance, but in the new method signature, cache is replaced by options and there's a potential for collisions or odd behavior (admittedly an edge case, but still).

addaleax commented 8 years ago

@TheAlphaNerd Yes, that’s what I’m having in mind, although I’m not sure doing that only on ELOOP is sufficient (in the issues I linked the libuv-based implementation throws EISDIR and ENOENT, respectively).

jhamhader commented 8 years ago

Is JS implementation as a fallback for native implementation done elsewhere in node? What about the rest of the fs functions affected by the ENOENT or ELOOP issues?

addaleax commented 8 years ago

@jhamhader As @isaacs wrote above, one use of realpath is precisely to avoid ELOOP situations in subsequent filesystem operations, if that’s what you’re asking.

saghul commented 8 years ago

@jasnell Isaac already pushed the module to npm. I'd personally like to see option 3 chosen, once we iron out the bugs that are lingering around.

The Windows bugs are probably the ones that would require more attention because our implementation of realpath doesn't really do much, so if there are problems with some types of volumes, we might be out of luck there.

On Unixes, one possibility is to roll our own realpath(3) in libuv instead of using the one in libc. That should make its behavior consistent across platforms (I think) plus we'd make it even faster on Linux by using the trick musl uses. On OSX (and other BDSs I believe) we could use open + fcntl F_GETPATH and if open fails with ELOOP do the loop / stat / count symlink dance.

piranna commented 8 years ago

@piranna If you look at it, it's a neat trick to get the realpath really quick:

open a fd to the given path use readlink to read the real path for /proc/self/fd/theFD check that they are the same profit!

Well, move all the processing to the kernel...

IMHO you should give up on trying not to mount /proc if you plan on using musl, it's a hard requirement as you've seen in the mail I linked, which is really recent.

Yeah, I've updated nodeos-init to do the mount there and now Node.js v6.2.2 works like a charm on NodeOS :-D I've also done some pending works and make it more robust once I was there, so win-win ^^ Thanks for the advice @saghul! :-D

rvagg commented 8 years ago

I've opened https://github.com/nodejs/CTC/issues/9 to postmortem how this has all gone down, I'm asserting that something has failed in our processes and would like to understand the chain of events, what has actually broken down, why we're having difficulty dealing with this decisively and what we can do to improve things. Of course if you disagree with any of that (i.e. perhaps you think our processes are fine or that the lack of decisiveness is not a problem), then please raise that too.

On what to do, I'd like to restore as much of the old behaviour as possible to provide continuity with how things worked pre-v6 and if we want to roll back the pre-v6 stuff do so more carefully.

We have a couple of issues to juggle though: the new errors being raised by realpath(3) (it seems that we have a bunch of them if we're saying that the new Windows bugs are caused by this, likely something to be fixed in libuv) and the removal of cache (which was replaced by another Object @ arguments[1]!).

Here's my proposal:

  1. Introduce an options.fast property to the options argument, a boolean that defaults to undefined
  2. When options.fast is true, do what we're doing now, just return whatever error uv_fs_realpath() gives, so you can opt in to this strict, fast behaviour
  3. When options.fast is falsey but not false (i.e. most likely when it's undefined, or we could just make this: when options.fast is undefined), on any error returned by uv_fs_realpath() fall back to the original JavaScript implementation
  4. When options.fast is strictly false, only use the JavaScript implementation
  5. When options.cache is an Object, only use the JavaScript implementation with that object as the cache
  6. When options contains neither 'encoding' or 'fast', use the options object as the cache but print a deprecation warning that this won't happen in v7+

We could consider deprecating options.fast later if we sort out the Windows problems and it could go through a proper deprecation cycle with warnings being printed that you're going to need to fix your stuff. How to deprecate the JavaScript implementation is a bit more dicey but we could even consider a deprecation cycle for that where we print a warning if we error and fallback and the fallback doesn't error with the error warning that you will have to be more careful in the next version.

If that's all too complicated my second preference would be to revert the change completely and come up with a way to opt in to the new behaviour, perhaps by a completely new method name. We might be drooling at how fast this is now but it's no excuse for causing such widespread breakage, particularly since the Windows errors suggest that we haven't even got the new implementation quite right.

jasnell commented 8 years ago

I'd be +1 on restoring the old fs.realpath() implementation and introducing the new impl using a new method, e.g. fs.realpathuv()' or fs.fastrealpath(), etc. I can appreciate the concern about adding a new separate method but making the options logic too complicated has it's own issues.