jestjs / jest

Delightful JavaScript Testing.
https://jestjs.io
MIT License
44.28k stars 6.46k forks source link

[Bug]: Worker exits but jest process never finishes (continuation) #13976

Closed juan-fernandez closed 1 year ago

juan-fernandez commented 1 year ago

Version

29.4.3

Steps to reproduce

  1. Clone my repo at git@github.com:juan-fernandez/test-jest-worker-killed-repro.git (thanks @gluxon for the inspiration!). See repo at https://github.com/juan-fernandez/test-jest-worker-killed-repro.
  2. Run npm install
  3. Run npm run test

Expected behavior

The process ends.

Actual behavior

The process hangs indefinitely

Additional context

This seems to be related to the bug described in https://github.com/facebook/jest/issues/13183 and fixed (maybe only partially) in https://github.com/facebook/jest/pull/13566. Also probably related to https://github.com/facebook/jest/issues/13864

The difference between https://github.com/juan-fernandez/test-jest-worker-killed-repro and https://github.com/gluxon/test-jest-worker-killed-repro (the original reproduction scenario) is that there are now more than one worker suddenly being killed.

From the looks of it, jest "runs out of workers" to run the suites and it just hangs forever:

hanging-2

Environment

  System:
    OS: macOS 13.2.1
    CPU: (10) arm64 Apple M1 Max
  Binaries:
    Node: 16.17.0 - ~/.volta/tools/image/node/16.17.0/bin/node
    Yarn: 1.22.19 - ~/.volta/tools/image/yarn/1.22.19/bin/yarn
    npm: 8.15.0 - ~/.volta/tools/image/node/16.17.0/bin/npm
  npmPackages:
    jest: ^29.4.3 => 29.4.3
gluxon commented 1 year ago

Hey @juan-fernandez — This is a fantastic bug report. I was able to reproduce easily after cloning your repo.

Is this a new problem?

I was a bit worried I made things worse with my bug fix, but I do think this is an existing bug that the fix in https://github.com/facebook/jest/pull/13566 simply uncovered. Before the bug fix, the worker pool coordinator didn't recognize when child processes were killed at all.

Early hypothesis

The coordinator now does recognize killed workers and prints an error message, but I think it's not performing any followup actions as a result. Specifically, I have a feeling it's not reassigning jobs that were delegated to the killed worker.

Setting maxWorkers=2

I noticed that there needs to be more tests present than the value of maxWorkers for the hanging to happen.

For debugging, I was able to simplify the repro by setting maxWorkers to 2 and running only 3 test files. As long as the killed worker runs first, I'm able to see consistent hanging. Similar to your theory @juan-fernandez, I think the simple-2.test.ts file below was assigned to run on the killed worker.

Screenshot 2023-03-04 at 3 00 44 PM

Possible Solutions

Assuming that hypothesis is correct, we need to either:

  1. Spawn a new worker when one is killed.
  2. Or more simply exit the entire test suite when any worker is killed.

I think option 1 makes more sense.

Timelines

I would like to come back and fix this, but need to wrap up a few other commitments first. If any Jest maintainers would like to take over, I'd definitely appreciate the help. Otherwise, I'll try to get a fix open as soon as possible.

axelchauvin commented 1 year ago

seeing the exact same problem. In our scenario, the SIGKILL is sent by the linux OOMKiller, killing one of the jest child workers and jest will just hang forever

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 30 days.

gluxon commented 1 year ago

Not stale! Still on my list, but wouldn't mind if anyone wants to take over.

PeteTheHeat commented 1 year ago

https://github.com/facebook/jest/pull/14015 fixes this, and seems like folks are okay with the approach.

I'll try to get it merged soon.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

SimenB commented 1 year ago

https://github.com/jestjs/jest/releases/tag/v29.6.0