topolvm / topolvm

Capacity-aware CSI plugin for Kubernetes
Apache License 2.0
786 stars 155 forks source link

Potential memory leak in lvmd process. #966

Open naemono opened 1 day ago

naemono commented 1 day ago

Describe the bug

Over time we are seeing a distinct increase in memory usage for the lvmd process (it doubles in about a week). It can be seen in the screenshots of memory usage over time. I have included information for 2x pods in the same kubernetes cluster, one which has been running for 8 days, and one that has been running for about 30 mins. It seems to be the "working set/wss" where we see the increase over time. I have also included pprof heap dumps for both pods. I have looked through the heap dumps and nothing stands out for me.

The NVMe disks on m6gd are being converted to a Logical volume, which is what's being provisioned by Topolvm, if it is relevant.

Environments

To Reproduce Steps to reproduce the behavior:

  1. Setup topolvm
  2. Run lvmd
  3. Wait days and analyze memory usage.

Expected behavior We would expect more of a flat line for memory usage.

Additional context

Pod running for 8 days: topolvm-provisioner-lvmd-0-qx57c-mem-usage

topolvm-provisioner-lvmd-0-qx57c.heap.gz

Pod running for 30m: topolvm-provisioner-lvmd-0-v4j4l-mem-usage

topolvm-provisioner-lvmd-0-v4j4l.heap.gz

This was originally referenced in https://github.com/topolvm/topolvm/pull/931, for context. Happy to provide as much information as you may need, let me know.

naemono commented 1 day ago

I had missed this comment: https://github.com/topolvm/topolvm/pull/931#issuecomment-2301402650

I don't know if we're seeing the same thing here...