Closed juan-fernandez closed 1 year ago
Hey @juan-fernandez — This is a fantastic bug report. I was able to reproduce easily after cloning your repo.
I was a bit worried I made things worse with my bug fix, but I do think this is an existing bug that the fix in https://github.com/facebook/jest/pull/13566 simply uncovered. Before the bug fix, the worker pool coordinator didn't recognize when child processes were killed at all.
The coordinator now does recognize killed workers and prints an error message, but I think it's not performing any followup actions as a result. Specifically, I have a feeling it's not reassigning jobs that were delegated to the killed worker.
maxWorkers=2
I noticed that there needs to be more tests present than the value of maxWorkers
for the hanging to happen.
For debugging, I was able to simplify the repro by setting maxWorkers
to 2
and running only 3 test files. As long as the killed worker runs first, I'm able to see consistent hanging. Similar to your theory @juan-fernandez, I think the simple-2.test.ts
file below was assigned to run on the killed worker.
Assuming that hypothesis is correct, we need to either:
I think option 1 makes more sense.
I would like to come back and fix this, but need to wrap up a few other commitments first. If any Jest maintainers would like to take over, I'd definitely appreciate the help. Otherwise, I'll try to get a fix open as soon as possible.
seeing the exact same problem. In our scenario, the SIGKILL is sent by the linux OOMKiller, killing one of the jest child workers and jest will just hang forever
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 30 days.
Not stale! Still on my list, but wouldn't mind if anyone wants to take over.
https://github.com/facebook/jest/pull/14015 fixes this, and seems like folks are okay with the approach.
I'll try to get it merged soon.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.
Version
29.4.3
Steps to reproduce
git@github.com:juan-fernandez/test-jest-worker-killed-repro.git
(thanks @gluxon for the inspiration!). See repo at https://github.com/juan-fernandez/test-jest-worker-killed-repro.npm install
npm run test
Expected behavior
The process ends.
Actual behavior
The process hangs indefinitely
Additional context
This seems to be related to the bug described in https://github.com/facebook/jest/issues/13183 and fixed (maybe only partially) in https://github.com/facebook/jest/pull/13566. Also probably related to https://github.com/facebook/jest/issues/13864
The difference between https://github.com/juan-fernandez/test-jest-worker-killed-repro and https://github.com/gluxon/test-jest-worker-killed-repro (the original reproduction scenario) is that there are now more than one worker suddenly being killed.
From the looks of it, jest "runs out of workers" to run the suites and it just hangs forever:
Environment