nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.39k stars 29.51k forks source link

Memory leak involving v20.16.0, native fetch, express, and dd-trace #54376

Closed mistval closed 2 months ago

mistval commented 2 months ago

Version

v20.16.0

Platform

Darwin xxx.xxx.ts.net 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:09:52 PDT 2024; root:xnu-10063.121.3~5/RELEASE_X86_64 x86_64

Subsystem

fetch

What steps will reproduce the bug?

This leak seems to require several factors:

  1. Node v20.16.0 (does not reproduce in v20.15.1)
  2. dd-trace npm module
  3. Native fetch (does not reproduce with node-fetch)
  4. Attaching a response from fetch to the Express request object

If any of these factors are missing, there is no leak. I'm not sure whether this would better be understood as a defect in Node.js or in dd-trace. The event that triggered this leak in our case was the upgrade from Node v20.15.1 to Node v20.16.0.

To reproduce:

  1. npm i dd-trace@5.2.0 express@4.18.2 (and optionally node-fetch@3.3.2 if you want to confirm that node-fetch does not trigger the leak)
  2. docker run -p 3001:80 -d kennethreitz/httpbin (server to respond to the fetch requests)
  3. node --max-old-space-size=50 repro.mjs with repro.mjs having the following content:
// Run with Node 20.15.1 to fix the issue
import 'dd-trace/init.js'; // Or: comment this line to fix the issue
import express from 'express';
// import fetch from 'node-fetch'; // Or: uncomment this line to fix the issue

const app = express();

app.get('/', async (req, res) => {
  try {
    const response = await fetch('http://localhost:3001/json');
    req.response = response; // Or: comment this line to fix the issue

    res.json({});
  } catch (err) {
    res.status(500).json({ error: err.message });
  }
});

app.listen(3000, () => {
  console.log(`Server listening on port 3000`);
});
  1. Use autocannon to spam requests: npx autocannon -m GET -d 1000 'http://localhost:3000'
  2. The server should crash with an out-of-memory error, it only takes around 10 seconds for me.

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior? Why is that the expected behavior?

No OOM crash, as that is the behavior of the previous LTS release of Node.js.

What do you see instead?

OOM crash:

<--- Last few GCs --->

[69763:0x7fb200008000]     3959 ms: Mark-Compact 46.9 (54.1) -> 45.2 (54.6) MB, 54.69 / 0.04 ms  (average mu = 0.455, current mu = 0.367) allocation failure; scavenge might not succeed
[69763:0x7fb200008000]     4048 ms: Mark-Compact 47.4 (54.9) -> 45.7 (59.1) MB, 57.54 / 0.02 ms  (average mu = 0.408, current mu = 0.358) task; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0x10985a650 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
 2: 0x109a1265c v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
 3: 0x109c023f7 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
 4: 0x109c00c59 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
 5: 0x109c7a2fe v8::internal::MinorGCJob::Task::RunInternal() [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
 6: 0x1098ca42e node::PerIsolatePlatformData::RunForegroundTask(std::__1::unique_ptr<v8::Task, std::__1::default_delete<v8::Task>>) [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
 7: 0x1098c8e77 node::PerIsolatePlatformData::FlushForegroundTasksInternal() [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
 8: 0x10a378811 uv__async_io [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
 9: 0x10a38c1ac uv__io_poll [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
10: 0x10a378e16 uv_run [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
11: 0x10976a6f3 node::SpinEventLoopInternal(node::Environment*) [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
12: 0x1098a23fa node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
13: 0x1098a20cc node::NodeMainInstance::Run() [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
14: 0x10981bf62 node::Start(int, char**) [/Users/randall/.nvm/versions/node/v20.16.0/bin/node]
15: 0x7ff808540366 start [/usr/lib/dyld]

Additional information

It seems likely that this is triggered by an update of undici in Node core as mentioned in this issue: https://github.com/nodejs/node/issues/54274#issuecomment-2288834439

But it seems like a distinct issue.

RedYetiDev commented 2 months ago

Hey, can you reproduce without express and dd-trace?

mistval commented 2 months ago

Hey, can you reproduce without express and dd-trace?

Hi, no I cannot. Removing any of the four contributing factors leads to resolution of the leak.

RedYetiDev commented 2 months ago

In that case, it sounds like this is an issue with a third-party dependency, and not Node.js itself.

KhafraDev commented 2 months ago

does it happen if you use fetch from the latest version of undici?

mistval commented 2 months ago

does it happen if you use fetch from the latest version of undici?

Actually it does NOT happen if I import fetch from 'undici';

I suppose that must mean this is already on its way to being resolved, so I'll go ahead and close this. Thanks!