Updating to Jest 22 causes coverage to run out of memory

gaearon commented 6 years ago

Since updating to Jest 22 in https://github.com/facebook/react/pull/11956, all React test runs with coverage fail:

A failure looks like this:

 PASS  packages/react-dom/src/events/__tests__/SelectEventPlugin-test.js
 PASS  packages/react-dom/src/__tests__/ReactDOMOption-test.js (8.546s)
 PASS  packages/react-dom/src/__tests__/ReactMountDestruction-test.js (8.088s)

<--- Last few GCs --->

[199:0x283aea0]   742209 ms: Mark-sweep 1412.2 (1467.6) -> 1412.2 (1466.6) MB, 2408.5 / 0.0 ms  (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 2409 ms) last resort GC in old space requested
[199:0x283aea0]   744462 ms: Mark-sweep 1412.2 (1466.6) -> 1412.2 (1466.6) MB, 2252.7 / 0.0 ms  last resort GC in old space requested

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x2ed934225ee1 <JSObject>
    3: completeUnitOfWork(aka completeUnitOfWork) [/home/circleci/project/packages/react-reconciler/src/ReactFiberScheduler.js:552] [bytecode=0x1e5a3fb9bb11 offset=1017](this=0xa6f6c182311 <undefined>,workInProgress=0x19c3ddb03201 <FiberNode map = 0x327cfdde93a9>)
    5: performUnitOfWork(aka performUnitOfWork) [/home/circleci/project/packages/react-reconciler/src/ReactFiberScheduler.js:646] [by...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: node::Abort() [node]
 2: 0x121a2cc [node]
 3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
 4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
 5: v8::internal::Factory::NewFixedArray(int, v8::internal::PretenureFlag) [node]
 6: v8::internal::FeedbackVector::New(v8::internal::Isolate*, v8::internal::Handle<v8::internal::SharedFunctionInfo>) [node]
 7: v8::internal::JSFunction::EnsureLiterals(v8::internal::Handle<v8::internal::JSFunction>) [node]
 8: v8::internal::Compiler::Compile(v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag) [node]
 9: v8::internal::Runtime_CompileLazy(int, v8::internal::Object**, v8::internal::Isolate*) [node]
10: 0x2c01a818463d
Done in 745.02s.
cat: ./coverage/lcov.info: No such file or directory
[error] "2018-01-05T18:50:27.351Z"  'error from lcovParse: ' 'Failed to parse string'
[error] "2018-01-05T18:50:27.353Z"  'input: ' ''
[error] "2018-01-05T18:50:27.353Z"  'error from convertLcovToCoveralls'

Not sure if this is a Jest issue but figured it's worth asking.

SimenB commented 6 years ago

Did it work with coverage on either Jest 21 or the 22 betas?

I'm not really sure how to start debugging it, as the test run is 15 minutes before blowing up from what I can see in circle logs, but bisecting jest releases (if applicable) can just run while I sleep or something.

gaearon commented 6 years ago

Yes, we had no issues with 21.3.0-beta.4.

thymikee commented 6 years ago

@mjesun may have more background on that. We were hitting memory problems coming from graceful-fs after refactoring sandboxing, and supposedly didn't manage to fully fix them before v22. You can see @azz comment on it: https://github.com/facebook/jest/issues/2179#issuecomment-355230393. Maybe you hit something similar?

Also, if your CircleCI has multiple cores available, you can try running e.g. jest -w=2 (here 2 workers)

gaearon commented 6 years ago

Also, if your CircleCI has multiple cores available, you can try running e.g. jest -w=2 (here 2 workers)

Is this the same as removing --runInBand? Without --runInBand it used to always OOM.

thymikee commented 6 years ago

yup, removing --runInBand, as this option doesn't spawn multiple workers (-w and Jest by default do)

gaearon commented 6 years ago

As I mentioned in the past it only made things worse. But I can try!

SimenB commented 6 years ago

I can reproduce the crash using React's code base on my own machine, FWIW

mjesun commented 6 years ago

The changes related to fixing memory leaks with graceful-fs and other modules that modify native references were rollbacked, mostly due to bugs during the node initialization process 😞. The only thing that remained is an improvement for cloning process.

Running without --runInBand should improve things, since the memory limit is then spread through many processes, increasing the overall available memory (as long as there is enough system memory). Running with a tuned Node (adding max-old-space-size + stack-size) should improve things as well.

I'm not sure at all on what could be causing the leak, but we use the --coverage flag internally and we haven't seen any of this issues, so I think it could be something specific to React tests. It'd be really nice if we had some sort of automated mechanism to run bisections; that would definitely improve detecting this kind of things.

SimenB commented 6 years ago

I can try to bisect 21.3.0-beta.4 to 22.0.4 (6 steps, apparently), but I'm not sure how conclusive it will be.

Obvious candidate is jsdom 11, might be an idea to just test with jsdom 9 and see if it still crashes

mjesun commented 6 years ago

@SimenB That would actually make sense, in relation to how we use the --coverage flag internally.

SimenB commented 6 years ago

Git bisect blames ef55e89f8ebe1c5977e0c2bf4a868f091f21136a, which is weird as 6d0c0f043f3c0782d55446dfca429bfc08e010e6 reverted it... @mjesun did you have to manually fix conflicts and something went wrong?

I'll try to verify by testing one commit before and after, but that's where git bisect took me

SimenB commented 6 years ago

Wait no! It blames e00529db9b9ce84c968a80829dfd1afea4853e40 (I didn't anticipate the test suite not crashing on that commit)...

I'll test some more. This is my bisect log, if anybody wanna see if they agree:

git bisect start
# bad: [e879099db1024f106757ee55cb0e9a6935488b43] Release version 22.0.4
git bisect bad e879099db1024f106757ee55cb0e9a6935488b43
# good: [0ea4c387004f94ea7a94d416d5923b35d564ee86] Don't report errors errors as unhandled if there is an existing "error" event handler (#4767)
git bisect good 0ea4c387004f94ea7a94d416d5923b35d564ee86
# good: [e5f58a6b0f6a599a477d8e4e3ed6f3a7ecb5ed11] chore(package): update linting dependencies (#4964)
git bisect good e5f58a6b0f6a599a477d8e4e3ed6f3a7ecb5ed11
# bad: [df9f47380cdc12bb9b7213ad86df511ef9671133] Bump Flow to 0.61 (#5056)
git bisect bad df9f47380cdc12bb9b7213ad86df511ef9671133
# bad: [6ade256b1583e458577145fef2a7bb566605bcf9] Revert "Add new "setupTestFramework" option (#4976)" (#5025)
git bisect bad 6ade256b1583e458577145fef2a7bb566605bcf9
# good: [9043a44fca5b5e4880aed3579ab0c56775ec686b] Add the Yammer logo to the 'who is using this' section of the website. (#4971)
git bisect good 9043a44fca5b5e4880aed3579ab0c56775ec686b
# bad: [4f1113c956b400ca50e70b0a1441e49e5dd304ad] Emphasise required return (#4999)
git bisect bad 4f1113c956b400ca50e70b0a1441e49e5dd304ad
# bad: [e00529db9b9ce84c968a80829dfd1afea4853e40] Make "weak" optional dependency and check it at runtime (#4984)
git bisect bad e00529db9b9ce84c968a80829dfd1afea4853e40
# good: [3d733f6c3716006b9ca5b3c0fa6401e5b3de456c] Added link to coverage option (#4978)
git bisect good 3d733f6c3716006b9ca5b3c0fa6401e5b3de456c
# good: [ef55e89f8ebe1c5977e0c2bf4a868f091f21136a] Re-inject native Node modules (#4970)
git bisect good ef55e89f8ebe1c5977e0c2bf4a868f091f21136a
# first bad commit: [e00529db9b9ce84c968a80829dfd1afea4853e40] Make "weak" optional dependency and check it at runtime (#4984)

SimenB commented 6 years ago

What I do to test is clone the react repo, run yarn && yarn remove jest then run this command to test: NODE_ENV=development ../jest/jest --config ./scripts/jest/config.source.js --coverage --runInBand --no-cache.

EDIT: I think it was a fluke, I've gotten 5 crashes in a row for ef55e89f8ebe1c5977e0c2bf4a868f091f21136a now. I think somehow the revert wasn't clean enough. I'm not sure how to best dig into it, though.

azz commented 6 years ago

I tried the workaround of aliasing graceful-fs to fs to no avail here.

rickhanlonii commented 6 years ago

Is this the same as removing --runInBand? Without --runInBand it used to always OOM. As I mentioned in the past it only made things worse. But I can try!

@gaearon there's a bug in Circle builds where running jest without --runInBand attempts to use the number of CPUs of the machine rather than the VM (which would make things worse). The --runInBand flag "fixes" this by forcing it to one worker, but you should be able to use -w for up to the number of CPUs given to the Circle VM

So for React, you should be able to use -w=2 since the circle jobs running React have 2 CPUs:

rickhanlonii commented 6 years ago

Setting maxWorkers=2 seems to have fixed the build in the above PR, but there's definitely still a leak

tryggvigy commented 6 years ago

I'm not sure at all on what could be causing the leak, but we use the --coverage flag internally and we haven't seen any of this issues, so I think it could be something specific to React tests.

Maybe this memory leak issue isn't tied to the use of --coverage. I'm working in a huge test suite (9514 tests) that is running into a similar problem without using --coverage. After upgrading to jest 22 the heap overflows about 2/3 way through the run. If the tests are run with --runInBand the leak is exaggerated. Only 2-3 tests run before the heap fills up.

Heap grows to ~1131MB in jest v20 Heap grows to ~2829 MB in jest v22

SimenB commented 6 years ago

Probably related: #5157

gaearon commented 6 years ago

@gaearon there's a bug in Circle builds where running jest without --runInBand attempts to use the number of CPUs of the machine rather than the VM (which would make things worse). The --runInBand flag "fixes" this by forcing it to one worker, but you should be able to use -w for up to the number of CPUs given to the Circle VM

Thanks for explaining, this makes total sense.

Should we file a bug with Circle or something? Since it affects all Jest consumers who run CI on Circle and the fix isn't obvious.

Maybe this memory leak issue isn't tied to the use of --coverage. I'm working in a huge test suite (9514 tests) that is running into a similar problem without using --coverage. After upgrading to jest 22 the heap overflows about 2/3 way through the run. If the tests are run with --runInBand the leak is exaggerated. Only 2-3 tests run before the heap fills up.

That was my thought as well. Our --coverage run has already done OOMs before, so I feel it was on the edge, and the bug/regression just “pushed it” a bit further which caused an OOM, but the issue seems more general.

benmccormick commented 6 years ago

I'm also seeing this bug, without using --coverage or Circle CI. Same situation that switching from Jest 21 to 22 either caused the bug or pushed us over the edge. Increasing CPUs solved that for us, but isn't ideal.

rickhanlonii commented 6 years ago

Ok I think I figured this out (see above PR)

Some notes:

It is a Jest leak (not jsdom)
It is present in all Jest 22 releases
It is not related to coverage
The new --detectLeak feature correctly indicated that there is a leak in all tests using Jest 22

Mardoxx commented 6 years ago

Still getting this on VSTS hosted agents. maxworkers=2 hack works around this. This is w/ jest 22.3.0

wallzero commented 6 years ago

I am also still seeing memory leak issues on GitLab pipelines with jest@^22.4.2

SimenB commented 6 years ago

Possible to setup a repro? It's likely that the leak is in your own code, but if you can set up a minimal repro we could take a look

wallzero commented 6 years ago

My tests were growing a little encumbered so I refactored quite a bit and now they are working. I still find it strange the update from 21.x.x. to 22.x.x would cause them to start failing; but perhaps it was for the best anyhow. Thanks again and sorry about that.

bilby91 commented 6 years ago

I'm seeing the same leaks with --coverage flag. So, we were using 21.x.x without any issues. A new commit introduced some changes that added 3 extra test files. After that commit. 21.x.x started failing and 22.x.x is failing the same way. If I turn off the coverage it started working for both version of jest.

johnste commented 6 years ago

I've also been experiencing issues with jest@22.4.2 in gitlab pipelines – tests ran very slow and finally crashed with a similar error as the one in the initial comment. Downgrading to @^21 seems to help.

SimenB commented 6 years ago

The original problem reported in this issue is fixed, can you create a new issue with a reproduction?

mtpetros commented 5 years ago

Setting maxWorkers=2 seems to have fixed the build in the above PR, but there's definitely still a leak

I also ran into an OOM test failure in CircleCI 2.0 using Jest 24.1.0. I ran Jest with --detectLeaks and detected no leaks coming from my tests. As @rickhanlonii suggested, adding --maxWorkers=2 fixed the problem in CircleCI.

I might be the odd duck here, but this leads me to believe that either this issue has not been resolved in Jest or this issue represents a problem with CircleCI (which I think is more likely).

Should we file a bug with Circle or something? Since it affects all Jest consumers who run CI on Circle and the fix isn't obvious.

@gaearon, do you know if such a bug was filed with CircleCI? I could not find one.

Thanks!

SimenB commented 5 years ago

The latest discussion is here: #5989. tl;dr - it's not possible to accurately detect how many cpus are available on the different CIs, with CircleCI being the hardest

github-actions[bot] commented 3 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

jestjs / jest

Updating to Jest 22 causes coverage to run out of memory #5239