Closed octogonz closed 7 months ago
Why this is opened after merged?
If you are referring to https://github.com/microsoft/rushstack/pull/2653, that is a workaround implemented in a project that is external to Jest.
The Jest work tracked by https://github.com/facebook/jest/issues/11354 is:
FORCE_EXIT_DELAY
to a much larger value or make it a configurable option.My theory is that if we increase the timeout, then the IPC message will eventually be received, and the warnings will go away. I will report back after I have tested that.
Following up to confirm that patching FORCE_EXIT_DELAY
to be 7000
(7 seconds) has mitigated our repro. The CI job that was failing 40% of the time has passed 10 consecutive runs. That's pretty good evidence for this theory.
@octogonz , thanks for doing the investigations here. Can you please let me know the work around to fix this issue in the normal Node.js project with jest environment?
My patch was implemented in the Jest task from the Heft toolchain. I don't know of an easy workaround for regular Jest.
Thank you @octogonz for the heads up.
I am also able to repro @octogonz's issue and was able to fix it locally by changing FORCE_EXIT_DELAY
in BaseWorkerPool
within jest-worker
.
any ETA for resolving/closing this bug ?
I'm hitting the same, and it's resolved with a higher FORCE_EXIT_DELAY.
My horrible solution in package.json:
"test": "sed -i -e 's/const FORCE_EXIT_DELAY = 500;/const FORCE_EXIT_DELAY = 7000;/g' ./node_modules/jest-worker/build/base/BaseWorkerPool.js && react-scripts test # See https://github.com/facebook/jest/issues/11354",
While debugging this, I noticed that the jest-haste-map library has an obvious flaw where it creates a potentially unlimited number of promises while crawling the disk
Rather naive but I added the following to ./node_modules/jest-haste-map/build/index.js
:
if (promises.length > 0) {
throw new Error(promises.length)
}
It is being called since I do hit it if I remove the conditional, but otherwise in my environment I never hit this -- so the unlimited promises theory isn't happening at least for me.
in my environment I never hit this
I measured something like 50 concurrent promises. The behavior probably depends on how it is invoked (jest-haste-map
has several modes of operation and filtering options) and how deeply nested is the directory tree that it recurses.
I am experiencing the same issue. This is my environment:
Executing our tests with --runInBand will run all tests serially, and solves the problem, but it is just for debugging purposes.
Additional info for this parameter -> https://jestjs.io/docs/cli#--runinband
It seems to be related to the way it is creating a worker pool of child processes that will run the tests.
It would be great to see FORCE_EXIT_DELAY
configurable as an option.
We are experiencing it in our integration tests, we have a lot of fakes that require teardowns, and the tests put the computer under a pretty hard load and sometimes they take over 500ms. If I --runInBand
it's OK, but to get it working with worker threads I need to exit every other process apart from the terminal and pray that no service starts indexing or something.
Increasing the FORCE_EXIT_DELAY
value has reduced our test run time from ~80s to ~70s now that worker threads aren't being killed.
Personally I fixed this for us by using patch-package which is good for this kind of thing, but it will need to be redone every time we update the jest-worker
module.
I am experiencing the same issue on macOS 11.6, Node 16.10 and Jest 27.2.5. As reported in this issue, increasing the FORCE_EXIT_DELAY
or using --runInBand
solves the problem for me. I would also prefer making FORCE_EXIT_DELAY
configurable, as suggested by others in this thread.
Had the same issue, increasing the delay to 1000 fixed it for me.
I have also been experiencing the same issue:
--detectOpenHandles
doesn't detect anything--runInBand
works fineChanging the FORCE_EXIT_DELAY
to something higher than 500 fixes the issue.
Seeing the same issue:
Random failures with "A worker process has failed to exit" error --detectOpenHandles doesn't detect anything --runInBand works fine Changing the FORCE_EXIT_DELAY to something higher than 500 fixes the issue.
Hey https://github.com/facebook/jest/pull/13139 might fix this. It sounds like the same symptoms.
Please could you try adding a
setupFilesAfterEnv: ['<rootDir>/jest-setup.js'],
to your config and in jest-setup.js
put
afterAll(() => {
global.gc();
});
and then run jest with --expose-gc
, so probably node --expose-gc ./node_modules/.bin/jest
?
Does that fix it?
@tjenkinson I am unable to replicate the error at this time. Not sure what changed, whether it's my code, Jest version or other dependencies.
Hey #13139 might fix this. It sounds like the same symptoms.
Please could you try adding a
setupFilesAfterEnv: ['<rootDir>/jest-setup.js'],
to your config and in
jest-setup.js
putafterAll(() => { global.gc(); });
and then run jest with
--expose-gc
, so probablynode --expose-gc ./node_modules/.bin/jest
?Does that fix it?
Oh my god after spending days with this issue, calling gc fixed it for me! Like OP using --detectOpenHandles
magically fixes the issue and the issue only seems to occur when running a larger amount of tests. I spent a lot of time trying to pinpoint the issue to a faulty test by running subsets of my tests but the issue would disappear then as well. One thing to note is that I'm running almost 4000 tests so this issue might only present itself in codebases with a lot of tests?
Great! I wrote up a bit more here: https://github.com/facebook/jest/pull/13139#issuecomment-1218472224
I think next the simplest thing is probably increase the FORCE_EXIT_DELAY
Hi, just want to expand on this erorr a bit - I have seen this come up as well when running a medium to large number of tests at once. The reason --detectOpenHandles
appears to solve it is because the tests are no longer being run in parallel. The problem is that the memory and/or CPU consumption of Jest is too high when running tests in parallel. It doesn't happen consistently, but it happens often enough to be a problem.
And this isn't just a problem on platforms like Jenkins. The resource usage of Jest is an obvious problem if I try to do anything else while my unit tests are running on my dev machine as well.
This issue is stale because it has been open for 1 year with no activity. Remove stale label or comment or this will be closed in 30 days.
This issue was closed because it has been stalled for 30 days with no activity. Please open a new issue if the issue is still relevant, linking to this one.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.
🐛 Bug Report
Our CI builds have been failing sporadically with this error from Jest:
"A worker process has failed to exit gracefully and has been force exited. This is likely caused by tests leaking due to improper teardown. Try running with --runInBand --detectOpenHandles to find leaks."
--detectOpenHandles
does not detect any problems. The error is reported randomly by various projects that build independently in a large monorepo, which suggests that this is not actually caused by a bad test.It occurs more frequently on Windows machines, and on machines with heavier load. But it has also been reported on individual developer MacBooks.
To Reproduce
You can trivially force the warning to be reported by reducing this constant to 0:
https://github.com/facebook/jest/blob/64d5983d20a628d68644a3a4cd0f510dc304805a/packages/jest-worker/src/base/BaseWorkerPool.ts#L20
Investigation
The
FORCE_EXIT_DELAY
constant is used by this code:https://github.com/facebook/jest/blob/64d5983d20a628d68644a3a4cd0f510dc304805a/packages/jest-worker/src/base/BaseWorkerPool.ts#L104-L116
500 ms seems way too small for a machine that is under heavy load. My theory is that if we increase the timeout, then the IPC message will eventually be received, and the warnings will go away. I will report back after I have tested that.
jest-haste-map may be the root cause
While debugging this, I noticed that the jest-haste-map library has an obvious flaw where it creates a potentially unlimited number of promises while crawling the disk:
https://github.com/facebook/jest/blob/4fa17e455662dc7918eecdb857dd946e9ce2dc53/packages/jest-haste-map/src/index.ts#L659-L678
For example, I observed that
Promise.all()
may create 50+ promises that are all crawling the filesystem in parallel. If the build orchestrator is building 8 projects in parallel, and each projects is potentially creating 50 Jest promises, that could obviously thrash the disk to the point where theFORCE_EXIT_DELAY
limit is exceeded.This is just a speculation -- I did not have time to try fixing this to see whether it resolves the problem or not.
Either way, it would be a good idea to apply some throttling to
_buildHasteMap()
to limit the parallelism. @cpojer @SimenBenvinfo