isaacs / node-graceful-fs

fs with incremental backoff on EMFILE
ISC License
1.27k stars 148 forks source link

4.2.2 causes stack overflows in Jest #174

Closed jnardone closed 5 years ago

jnardone commented 5 years ago

Something in the bump from 4.2.1 -> 4.2.2 is causing stack overflow issues with Jest. Jest uses graceful-fs.

(Found originally when reading this comment thread: https://github.com/isaacs/node-graceful-fs/commit/9841d473354e8894e142e5216f8eec93a581f3e1#r34841595 )

Using pinned versions of jest (24.8, 24.9) and simply using < 4.2.2 and 4.2.2 goes from test harnesses running to failing, both in setup steps as well as text execution

e.g.

    RangeError: Maximum call stack size exceeded
        at Function.[Symbol.hasInstance] (<anonymous>)

      at ReadStream (../../node_modules/graceful-fs/graceful-fs.js:177:14)
      at ReadStream (../../node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (../../node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (../../node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (../../node_modules/graceful-fs/graceful-fs.js:178:28)
      ...

Obviously this depends on the tests being executed (simple tests don't fail).

isaacs commented 5 years ago

@jnardone Do you have a repo I could clone and poke at this bug myself?

jnardone commented 5 years ago

Let me try and work one up in the next few days.

isaacs commented 5 years ago

@jnardone Alternatively, you can npm install isaacs/node-graceful-fs#revert-do-not-monkeypatch-streams and see if that fixes the situation. That would also be useful information.

jnardone commented 5 years ago

so far my tests are non-deterministic. which is super frustrating. i'll keep at it.

fcrick commented 5 years ago

This seems to be related to having two graceful-fs packages imported from different locations on disk. I caught it in the debugger here, and i see patch in a 4.2.2 one being run first, then with another dependency, patch in 4.1.15 is being called.

If I symlink both to the same 4.2.2, I no longer get the issue.

However, if both modules have their own 4.2.2 in their respective node_modules folders, the issue persists.

It appears 4.2.2 triggers the problem when graceful-fs is loaded from two locations, even if only one of them is 4.2.2. Possibly only when the 4.2.2 one is loaded in first, but maybe not. Sorry I don't have a sample to run.

isaacs commented 5 years ago

@fcrick this is a great clue, thanks!

dylang commented 5 years ago

Hi @isaacs, I'm not positive, but it looks like https://github.com/isaacs/node-graceful-fs/commit/20b77c49ccf44a2de97afb02c0cc80d1d5d7f126 changed the patch from looking at the implementation of fs.close to see if it was patched to storing a symbol in global.

I'm guessing in Jest it caches the modules between tests, but global is fresh for each test, so this check is not enough.

Also, because the test has changed, it's now possible to have two sets of patches on fs when verisons of the older module and newer module are used together. This may introduce new issues.

dylang commented 5 years ago

This could also be why the memory leak test I added in https://github.com/isaacs/node-graceful-fs/commit/1bf23459bb7333674d5cb552ec5000be70ac4ef0 didn't find it. To find it in Jest, I created 100 test files that imported graceful-fs and ran jest with

node --expose-gc ./node_modules/.bin/jest --logHeapUsage --runInBand
c1rrus commented 5 years ago

Hi there, I think a project of mine has been hit by the same issue. A while ago, after updating some of my project's NPM dependencies (incl. Jest), I started seeing a unit test failing intermittently on my laptop (Arch Linux, Node 8.16.1) and always failing on Travis CI (using their default Ubuntu Xenial environment). The test had passed consistently before and neither the code nor the test had been modified.

The test in question usesfs.createReadStream() and the error when that test failed was also:

    RangeError: Maximum call stack size exceeded
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:1:1)
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:178:28)
      at ReadStream (node_modules/graceful-fs/graceful-fs.js:178:28)

I eventually stumbled across this issue and, on a hunch, tried forcing my CI to pick up the 4.2.1 version of graceful-fs instead of 4.2.2 (I did it by manually editing my package-lock.json). Sure enough, the test passes again!

Perhaps this will help your investigation: I have a "before" commit where 4.2.2 is used and my test fails on Travis CI and I have an "after" commit which does nothing besides downgrading to 4.2.1 and it passes:

Hope that helps!

sheerun commented 5 years ago

So probably the solution is to store flag on fs object, instead of on global?

medikoo commented 5 years ago

However, if both modules have their own 4.2.2 in their respective node_modules folders, the issue persists.

@fcrick on my side I do not observe it.. As long as all modules rely on 4.2.2 (different instances), all works fine. Still in my case first loaded module does the global patch, so maybe it depends on that.

As I narrowed it down, crash happens when my main module does global patch with 4.2.2, and any dependency is relying on other instance, that's lower than 4.2.2

I've fixed the issue by downgrading main module to rely on 4.2.1, then all dependencies using different instances of graceful-fs locally (also at 4.2.2) seem to work fine.

btav commented 5 years ago

Thanks @c1rrus for the fix.

In my scenario, jest was failing on CircleCI with the RangeError: Maximum call stack size exceeded error. However, it was working locally. I manually changed the graceful-fs entries in my package-lock.json to 4.2.1 and it worked.

bpasero commented 5 years ago

This is horrible.

mrabuse commented 5 years ago

@isaacs Are there plans to release a fix to this bug? Many packages rely on this one and jest randomly breaking when running fs module code has broken my team's CI/CD reliability.

isaacs commented 5 years ago

Ok, so, here's what I need for a fix to happen on this.

First, most importantly, we need a reproducible test case. A lot of folks are chiming in here saying it's happening to them, in CI/CD, in their applications, etc. Give me a case that fails, a repo I can clone and a command to run to demonstrate the problem.

People have pointed out that it's related to the global queue, or to jest blowing away the global object. I'm not going to just run on the hypothesis that it's related to the global singleton queue, without some test so I can verify that it doesn't happen when putting the queue on the fs module object as well.

It seems to only affect people using Jest. At the risk of seeming insensitive, I don't maintain Jest. It doesn't affect tap, the test framework I do maintain (and maintain quite actively, considering it's not my paying job). So, "this module (which works fine everywhere else) is a problem for jest" really sounds more like a "jest" problem than a "me" problem.

If you provide me with a reproducible test case, I'll fix this as soon as I get to it, assuming I can turn that reproducible test case into a failing test.

Graceful-fs is also used by npm, tap, nyc, and lots of other extremely popular and widely used libraries, where the commit in question fixes a pretty bad bug with leaked file descriptors when graceful-fs is loaded multiple times. Jest appears to be the only one broken by this, and I can't honestly say whether that's due to something inappropriate or dangerous that Jest is doing, or if there's actually a problem here. If we're going to roll back that fix, or explore some other way to do it, we need to do so properly. That means: with a test that fails before the patch, fixed in such a way that it doesn't break any other tests.

If you don't want to wait around for "as soon as I can get to it", then the best alternative is to submit a patch, with a failing test demonstrating the issue.

If you don't want to wait, and don't have time to come up with a reproducible test case of your own, then you can pay me to prioritize this work ahead of the other things going on in my life. I still will need a reproducible case to poke at, but it doesn't have to be nice, and I'll get right on it if I can justify the personal expense.

If you don't have time and attention to spend on this, and also don't have money to spend on it, I'm sorry, but there's little I can do responsibly. I recommend bugging the Jest project to see if there's any help they can provide.

isaacs commented 5 years ago

Here, I'll be generous if anyone's interested: https://gist.github.com/isaacs/bbf8cecc0deee4fb535e3b153e81732b

That's a reproducible failure case. Run npm install in the root of that gist, and then node index.js will show the crash.

Next step is to unpack exactly what's going on there. My read on this is basically it's a bug in 4.2.1, not in 4.2.2. If you upgrade everything to 4.2.2, it works fine. Swap around the order of the graceful-fs instances, so the new one is gracefullifying the old one rather than vice versa, and it works fine.

Patch welcome.