grafana / pyroscope-nodejs

Pyroscope NodeJS integration
Apache License 2.0
27 stars 21 forks source link

Memory leak when Pyroscope is enabled #28

Open nwalters512 opened 1 year ago

nwalters512 commented 1 year ago

When we enable Pyroscope on our application, we observe a steady increase in memory consumption. I've annotated a AWS CloudWatch graph of Node RSS memory:

Untitled drawing

Some commentary about what we saw:

These lines of code are the only difference between "leaking memory" and "not leaking memory":

          const Pyroscope = require('@pyroscope/nodejs');
          Pyroscope.init({
            appName: 'prairielearn',
            // Assume `config` contains sensible values.
            serverAddress: config.pyroscopeServerAddress,
            authToken: config.pyroscopeAuthToken,
            tags: {
              instanceId: config.instanceId,
              ...(config.pyroscopeTags ?? {}),
            },
          });
          Pyroscope.start();

I recognize this isn't a ton of information to go off, so I'd be happy to provide anything else that might help get to the bottom of this. We'd love to use Pyroscope, but our experience so far is an obvious dealbreaker.

Rperry2174 commented 1 year ago

Thanks for reporting @nwalters512 and sorry for the inconvenience. We'll take a look and see if we have more questions / what it will take to fix this. cc @eh-am @petethepig

nwalters512 commented 1 year ago

Thanks @Rperry2174! In case it's useful, I've discovered that heap total/used memory remains constant even as the RSS memory grows seemingly without bounds. From some reading elsewhere (e.g. https://github.com/nodejs/help/issues/1518), it seems as though that could indicate a leak in native code, perhaps in https://github.com/google/pprof-nodejs? Alternatively, it may be that there's not enough memory pressure for the system to be reclaiming this memory?

Here's the memory metrics for a single process on this host (I enabled Pyroscope only on that one process; the rest don't have Pyroscope enabled and don't see constant RSS growth):

Screenshot 2023-02-22 at 11 46 37
nwalters512 commented 1 year ago

I also managed to capture a heap snapshot when the RSS was at ~700MB. Unsurprisingly, the heap snapshot only shows ~110MB of memory allocations, which is consistent with the NodeMemoryHeapTotal and NodeMemoryHeapUsed metrics at the time.

korniltsev commented 1 year ago

@nwalters512 could you let me know your node version and also the arch and os/base docker image you're using? I'm curious if the issue is only happening on EC2 or if it's happening locally as well.

It would be really helpful if we could reproduce the issue with a docker container, but I'm not sure how difficult that would be to set up.

While I'm trying to reproduce the issue locally, maybe we could start by running CPU and memory profiling exclusively on your staging environment? Instead of using Pyroscope.start(), we could try using startCpuProfiling() and startHeapProfiling() to see if it's a CPU or memory issue or both. That might help us narrow down our focus for further investigation.

nwalters512 commented 1 year ago

@korniltsev this is Node 16.16.0, x86_64, Amazon Linux 2 running directly on the host (not inside Docker). Unfortunately I was unable to reproduce this locally, but it does happen very consistently across multiple EC2 hosts.

Good idea on trying to narrow this down to CPU vs. heap profiling! Let me give that a shot and report back with any findings.

nwalters512 commented 1 year ago

@korniltsev it does look like this is limited to CPU profiling.

Screenshot 2023-02-23 at 09 52 19

At around 17:13, I updated the code to only call startCpuProfiling() and restarted the process; you can see RSS starts growing immediately. At around 17:28, I changed it to only call startHeapProfiling(), and RSS has been stable since then.