Mitigate memory leaks in jest-environment-node

eyalroth commented 3 months ago

Summary

This PR attempts to drastically mitigate memory leaks in jest-environment-node, raised by multiple issues:

The leaks are likely caused by the Jest/NodeJS sandbox (vm context) not being fully cleaned up between test suites.

This PR introduces 3 orthogonal mitigations, all made during env teardown.

The mitigations do not eliminate the leak entirely, but they drastically reduce it (at least in my testing).

The mitigations (each in its own commit):

1. Global cleanup

In my testing, the biggest memory leak was caused by custom objects on global that keep references to source-code strings.

This change deletes all properties of all the objects in the environment's global; i.e:

// pseudo code
for (value in global) {
  for (key in value) {
    delete value[key];
  }
}

The exceptions are:

Objects set on the global in the environment constructor
Objects protected by a new "shreder" API in jest-utils (setNotShreddable)
- For example, jest-circus's state is leaking, so it should be deleted; however, some of the properties are still used after teardown -- for instance, unhandledErrors and unhandledRejectionErrorByPromise.
The performance property (hack)
- react-native comes with a preset that overrides this property on env-setup and never reverts it. Then, react-test-renderer attempts to access performance (via scheduler) after the env has already been torn down. A fix needs to be made to react-native to either (a) revert the performance property at teardown or (b) protect the custom performance object with the new shredder API.

The PR requires an update to the official docs, so users would know about this behavior, and how to avoid it if the need arises.

2. Event listeners cleanup

Another memory leak came from event listeners on the env process and node:cluster module, so these are removed at env teardown.

3. source-map-support cleanup

Although some cleanup of source-map-support exists in Jest, it's not enough. This module overrides Error.prepareStackTrace with an implementation that holds a cache that contains (memory-heavy) source code.

The mitigation here is to override Error.prepareStackTrace back to a no-op method at env teardown.

Stacked of nested (cause) errors were harmed by this change, so a better assurance of stack generation was added to jest-runner.

Test plan

I have been using one of our private repositories that are suffering from heavy memory leaks.

All of them use some version of NodeJS (18.x), Jest (27/29) and Typescript (4.x). They also make use of sequelize, lodash, moment, log4js, AWS SDK (v3) and many more packages.

Method

Clone Jest.
Modify runTest.ts in jest-runner to run with the local jest environment (/path/to/jest/packages/jest-environment-node/build/index.js) instead of the test repo.
Build the project.
Add .nvmrc with 20.14.0.

Run jest with this script in jest-cli:

// package.json
"scripts": {
"jest": "NODE_OPTIONS='--no-experimental-fetch' node --inpsect-brk ./bin/jest.js -c /path/to/repo/test/unit/jest.config.js --detectOpenHandles --runInBand --forceExit --logHeapUsage"
}

(we use --no-experimental-fetch due to our use of nock for mocking HTTP)

Attach to the process via Chrome devtools, and let it run.
Capture a full heap snapshot soon after letting the process run.
Capture a second full heap snapshot after letting the process run a while and the heap grow.
Examined the objects that were allocated between the snapshots, targeting objects with highest shallow size.

Steps

1. Original code

With Jest's original code, we can see the following: memory1

A huge chunk of memory is lodash's source code replicated by references to it from a custom matcher:

import _ from 'lodash';

expect.extend({ satisfies(...) { 
  // code using lodash
}});

However, the vast majority of memory is probably held by jest-circus state (symbol JEST_STATE_SYMBOL):

In this snapshot we can see that just one state (of one test file) is leaking ~25MB in memory.

2. With global cleanup

This is after applying the first mitigation: memory2

The leak has been reduced dramatically.

What we see now is (again) references to lodash's source code, but from a custom log4js we use:

import log4js from 'log4js';
import _ from 'lodash';

log4js.addLayout('json', () => {
  // code using lodash
});

Turns out that log4js adds the layout function to node:cluster module as an event listener.

3. With global cleanup + event listeners cleanup

Next, after adding the event-listeners cleanup as well, we see this: memory3

This time, it seems that sequelize's source (map) code is being duplicated in memory.

In our version of seqeulize, it creates a new Error() on each query (this was replaced in https://github.com/sequelize/sequelize/pull/15559). This error is somehow captured and persisted in memory, keeping a custom prepareStackTrace (from source-map-support) that keeps the source code.

4. With all mitigations

After all mitigations in this PR applied: memory4

We see that there's still some leaking source code, but that's for another PR :)

netlify[bot] commented 3 months ago

Deploy Preview for jestjs ready!

Built without sensitive environment variables

Name	Link
Latest commit	05592c7f6c3f2db42bb1eb94d5d7a49d59541318
Latest deploy log	https://app.netlify.com/sites/jestjs/deploys/66acca0c2ebedb000861cf93
Deploy Preview	https://deploy-preview-15215--jestjs.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

SimenB commented 3 months ago

Thank you so much for working on this!

My main feedback would be to split up the changes here (already in atomic commits, which is nice!) into separate PRs so they can be merged independently of each other.

Some quick feedback on each of these

This seems a bit much to me at first glance, but I'll need to dig deeper
Unless those listeners are added by Jest, I don't think we should clean them up. All the node core modules share a cache (it's impossible to load node core modules within vm: https://github.com/nodejs/node/issues/31852). This needs to be done by users (if log4js has global side effects it should also expose a way of cleaning that up).
This seems reasonable, but could we do this in the teardown of the test env rather than the environment? I.e. after https://github.com/jestjs/jest/blob/213c3b2770a3278fe47267c1ace250251954fe4f/packages/jest-runner/src/runTest.ts#L381 (which I thought was supposed to solve exactly the issue you highlight 🤔)

On the log4js thing - essentially if this adds more and more listeners, you'll need to clean up yourself.

const log4jsPath = require.resolve('log4js');

require(log4jsPath);
delete require.cache[log4jsPath];
require(log4jsPath);
delete require.cache[log4jsPath];
require(log4jsPath);
// etc.

eyalroth commented 3 months ago

@SimenB Thanks for the reply!

My main feedback would be to split up the changes here

Agreed. I wanted 1 PR mostly for the tests and the "dev journey". I'll keep the first method (global cleanup) here and move the others to separate PRs.

This seems a bit much to me at first glance, but I'll need to dig deeper

Waiting on your thoughts :) This is the bulk of the PR and the thing that reduces memory leak the most (by far).

All the node core modules share a cache

Yep. I missed that. At first I tried to clean listeners directly on node:cluster (without VM) and it didn't remove log4js's listeners; however, that was likely due to webpack replacing require.

This code illustrates how the listeners are indeed shared:

import vm from 'node:vm';
import cluster from 'node:cluster';

console.log(cluster.listenerCount('foo')); // 0
cluster.on('foo', () => {});
console.log(cluster.listenerCount('foo')); // 1

const context = vm.createContext();
vm.runInContext(`
  const cluster = require('node:cluster');
  cluster.removeAllListeners();
`, Object.assign(context, { require, console }));

console.log(cluster.listenerCount('foo')); // 0

I could potentially use the same technique I did with global but with cluster's listeners, but I feel like it's not worth it.

could we do this in the teardown of the test env rather than the environment?

Sure, I'll do it in a separate PR.

(which I thought was supposed to solve exactly the issue you highlight 🤔)

It only cleans custom handlers, but it doesn't clean fileContentsCache.

SimenB commented 3 months ago

(which I thought was supposed to solve exactly the issue you highlight 🤔)

It only cleans custom handlers, but it doesn't clean fileContentsCache.

Mind opening a PR upstream with that? It was my PR some years ago that added the cleanup, seems I missed some 😀

https://github.com/evanw/node-source-map-support/pull/215

eyalroth commented 3 months ago

@SimenB

Mind opening a PR upstream with that? It was my PR some years ago that added the cleanup, seems I missed some 😀

Tried my best (without tests): https://github.com/evanw/node-source-map-support/pull/339

sibelius commented 1 month ago

any progress on this?

does this also improves CPU ?

jestjs / jest