Spurious error: "A worker process has failed to exit gracefully and has been force exited."

octogonz commented 3 years ago

🐛 Bug Report

Our CI builds have been failing sporadically with this error from Jest:

"A worker process has failed to exit gracefully and has been force exited. This is likely caused by tests leaking due to improper teardown. Try running with --runInBand --detectOpenHandles to find leaks."

--detectOpenHandles does not detect any problems. The error is reported randomly by various projects that build independently in a large monorepo, which suggests that this is not actually caused by a bad test.

It occurs more frequently on Windows machines, and on machines with heavier load. But it has also been reported on individual developer MacBooks.

To Reproduce

You can trivially force the warning to be reported by reducing this constant to 0:

https://github.com/facebook/jest/blob/64d5983d20a628d68644a3a4cd0f510dc304805a/packages/jest-worker/src/base/BaseWorkerPool.ts#L20

Investigation

The FORCE_EXIT_DELAY constant is used by this code:

https://github.com/facebook/jest/blob/64d5983d20a628d68644a3a4cd0f510dc304805a/packages/jest-worker/src/base/BaseWorkerPool.ts#L104-L116

500 ms seems way too small for a machine that is under heavy load. My theory is that if we increase the timeout, then the IPC message will eventually be received, and the warnings will go away. I will report back after I have tested that.

jest-haste-map may be the root cause

While debugging this, I noticed that the jest-haste-map library has an obvious flaw where it creates a potentially unlimited number of promises while crawling the disk:

https://github.com/facebook/jest/blob/4fa17e455662dc7918eecdb857dd946e9ce2dc53/packages/jest-haste-map/src/index.ts#L659-L678

For example, I observed that Promise.all() may create 50+ promises that are all crawling the filesystem in parallel. If the build orchestrator is building 8 projects in parallel, and each projects is potentially creating 50 Jest promises, that could obviously thrash the disk to the point where the FORCE_EXIT_DELAY limit is exceeded.

This is just a speculation -- I did not have time to try fixing this to see whether it resolves the problem or not.

Either way, it would be a good idea to apply some throttling to _buildHasteMap() to limit the parallelism. @cpojer @SimenB

envinfo

  System:
    OS: Windows 10 10.0.19041
    CPU: (8) x64 AMD Ryzen 7 3700X 8-Core Processor
  Binaries:
    Node: 12.20.1 - C:\Program Files\nodejs\node.EXE
    Yarn: 1.22.10 - C:\Program Files\nodejs\yarn.CMD
    npm: 6.14.10 - C:\Program Files\nodejs\npm.CMD

madhur1846 commented 3 years ago

Why this is opened after merged?

octogonz commented 3 years ago

If you are referring to https://github.com/microsoft/rushstack/pull/2653, that is a workaround implemented in a project that is external to Jest.

The Jest work tracked by https://github.com/facebook/jest/issues/11354 is:

Increase FORCE_EXIT_DELAY to a much larger value or make it a configurable option.
Fix the performance issue in jest-haste-map by throttling the promise.

octogonz commented 3 years ago

My theory is that if we increase the timeout, then the IPC message will eventually be received, and the warnings will go away. I will report back after I have tested that.

Following up to confirm that patching FORCE_EXIT_DELAY to be 7000 (7 seconds) has mitigated our repro. The CI job that was failing 40% of the time has passed 10 consecutive runs. That's pretty good evidence for this theory.

jconway325 commented 3 years ago

@octogonz , thanks for doing the investigations here. Can you please let me know the work around to fix this issue in the normal Node.js project with jest environment?

octogonz commented 3 years ago

My patch was implemented in the Jest task from the Heft toolchain. I don't know of an easy workaround for regular Jest.

jconway325 commented 3 years ago

Thank you @octogonz for the heads up.

r00b commented 3 years ago

I am also able to repro @octogonz's issue and was able to fix it locally by changing FORCE_EXIT_DELAY in BaseWorkerPool within jest-worker.

satyamamit commented 3 years ago

any ETA for resolving/closing this bug ?

stbenjam commented 3 years ago

I'm hitting the same, and it's resolved with a higher FORCE_EXIT_DELAY.

My horrible solution in package.json:

"test": "sed -i -e 's/const FORCE_EXIT_DELAY = 500;/const FORCE_EXIT_DELAY = 7000;/g' ./node_modules/jest-worker/build/base/BaseWorkerPool.js && react-scripts test # See https://github.com/facebook/jest/issues/11354",

stbenjam commented 3 years ago

While debugging this, I noticed that the jest-haste-map library has an obvious flaw where it creates a potentially unlimited number of promises while crawling the disk

Rather naive but I added the following to ./node_modules/jest-haste-map/build/index.js:

if (promises.length > 0) {
  throw new Error(promises.length)
}

It is being called since I do hit it if I remove the conditional, but otherwise in my environment I never hit this -- so the unlimited promises theory isn't happening at least for me.

octogonz commented 3 years ago

in my environment I never hit this

I measured something like 50 concurrent promises. The behavior probably depends on how it is invoked (jest-haste-map has several modes of operation and filtering options) and how deeply nested is the directory tree that it recurses.

breogangf commented 3 years ago

I am experiencing the same issue. This is my environment:

MacOS: 11.5.2
Node: v15.11.0
npm: v7.6.0
jest: ^27.0.6

Executing our tests with --runInBand will run all tests serially, and solves the problem, but it is just for debugging purposes.

Additional info for this parameter -> https://jestjs.io/docs/cli#--runinband

It seems to be related to the way it is creating a worker pool of child processes that will run the tests.

ravenscar commented 3 years ago

It would be great to see FORCE_EXIT_DELAY configurable as an option.

We are experiencing it in our integration tests, we have a lot of fakes that require teardowns, and the tests put the computer under a pretty hard load and sometimes they take over 500ms. If I --runInBand it's OK, but to get it working with worker threads I need to exit every other process apart from the terminal and pray that no service starts indexing or something.

Increasing the FORCE_EXIT_DELAY value has reduced our test run time from ~80s to ~70s now that worker threads aren't being killed.

Personally I fixed this for us by using patch-package which is good for this kind of thing, but it will need to be redone every time we update the jest-worker module.

dimip1606 commented 3 years ago

I am experiencing the same issue on macOS 11.6, Node 16.10 and Jest 27.2.5. As reported in this issue, increasing the FORCE_EXIT_DELAY or using --runInBand solves the problem for me. I would also prefer making FORCE_EXIT_DELAY configurable, as suggested by others in this thread.

frabbit commented 2 years ago

Had the same issue, increasing the delay to 1000 fixed it for me.

tomgransden commented 2 years ago

I have also been experiencing the same issue:

Random failures with "A worker process has failed to exit" error
--detectOpenHandles doesn't detect anything
--runInBand works fine

Changing the FORCE_EXIT_DELAY to something higher than 500 fixes the issue.

avidan-chen commented 2 years ago

Seeing the same issue:

Random failures with "A worker process has failed to exit" error --detectOpenHandles doesn't detect anything --runInBand works fine Changing the FORCE_EXIT_DELAY to something higher than 500 fixes the issue.

tjenkinson commented 2 years ago

Hey https://github.com/facebook/jest/pull/13139 might fix this. It sounds like the same symptoms.

Please could you try adding a

setupFilesAfterEnv: ['<rootDir>/jest-setup.js'],

to your config and in jest-setup.js put

afterAll(() => {
  global.gc();
});

and then run jest with --expose-gc, so probably node --expose-gc ./node_modules/.bin/jest ?

Does that fix it?

avidan-chen commented 2 years ago

@tjenkinson I am unable to replicate the error at this time. Not sure what changed, whether it's my code, Jest version or other dependencies.

cfry2 commented 2 years ago

Hey #13139 might fix this. It sounds like the same symptoms.

Please could you try adding a
setupFilesAfterEnv: ['<rootDir>/jest-setup.js'],
to your config and in jest-setup.js put
afterAll(() => {
  global.gc();
});
and then run jest with --expose-gc, so probably node --expose-gc ./node_modules/.bin/jest ?

Does that fix it?

Oh my god after spending days with this issue, calling gc fixed it for me! Like OP using --detectOpenHandles magically fixes the issue and the issue only seems to occur when running a larger amount of tests. I spent a lot of time trying to pinpoint the issue to a faulty test by running subsets of my tests but the issue would disappear then as well. One thing to note is that I'm running almost 4000 tests so this issue might only present itself in codebases with a lot of tests?

tjenkinson commented 2 years ago

Great! I wrote up a bit more here: https://github.com/facebook/jest/pull/13139#issuecomment-1218472224

I think next the simplest thing is probably increase the FORCE_EXIT_DELAY

orofbrown commented 1 year ago

Hi, just want to expand on this erorr a bit - I have seen this come up as well when running a medium to large number of tests at once. The reason --detectOpenHandles appears to solve it is because the tests are no longer being run in parallel. The problem is that the memory and/or CPU consumption of Jest is too high when running tests in parallel. It doesn't happen consistently, but it happens often enough to be a problem.

And this isn't just a problem on platforms like Jenkins. The resource usage of Jest is an obvious problem if I try to do anything else while my unit tests are running on my dev machine as well.

github-actions[bot] commented 8 months ago

This issue is stale because it has been open for 1 year with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] commented 7 months ago

This issue was closed because it has been stalled for 30 days with no activity. Please open a new issue if the issue is still relevant, linking to this one.

github-actions[bot] commented 6 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

jestjs / jest