google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
16.94k stars 2.31k forks source link

kubelet memory leaks for housekeeping goroutines keeps leakiness #3218

Open attlee-wang opened 1 year ago

attlee-wang commented 1 year ago

What happened?

In my online kubernetes cluster, kubelet memory keeps growing, finally more than 50G, and kill many low-priority processes with memory.

image

I observed goroutines of kubelet are also increasing synchronously with the memory:

image

I use golang pprof analysis and found that many goroutines stay in the housekeeping logic

housekeeping() goroutines have only one exit point, which is read <-c.stop chan massage. But I didn't find anything unusual by checking the kebelet log, so why housekeeping() goroutines do keep growing?

What did you expect to happen?

Housekeeping() goroutines exit normally and find out why it can't exit normally.

How can we reproduce it (as minimally and precisely as possible)?

I don't know reproduce it, restart the kubelet and the problem will disappear

Anything else we need to know?

No response

Kubernetes version

```console $ kubectl version Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"---", GitTreeState:"clean", BuildDate:"2020-06-26T03:47:41Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"---", GitTreeState:"clean", BuildDate:"2020-06-26T03:39:24Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"} ```

Cloud provider

Internal private cloud platform

OS version

```console # On Linux: root@:~# cat /etc/os-release PRETTY_NAME="Debian GNU/Linux bookworm/sid" NAME="Debian GNU/Linux" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" ```

Install tools

Container runtime (CRI) and version (if applicable)

root@:~# crictl version Version: 0.1.0 RuntimeName: containerd RuntimeVersion: v1.5.5-9 RuntimeApiVersion: v1alpha2

Related plugins (CNI, CSI, ...) and versions (if applicable)

attlee-wang commented 1 year ago

I also noticed issues #3014, but after my investigation, the reason is different.

I add log to housekeeping() found that add housekeeping() is much more than stop housekeeping(), but there are only a few containers on my node.

38D05xyX8S

Adding logs to the code, I found that the lastWatched of the failed container is always false, so it is not sent stop housekeeping(). But I didn't find why lastWatched is always false of the failed container. @bobbypage @iwankgb

pacoxu commented 1 year ago

Kubernetes v1.18.5 is using cadvisor v0.35.0.

attlee-wang commented 1 year ago

Kubernetes v1.18.5 is using cadvisor v0.35.0.

@pacoxu yes. I sync # 100326 and # issues codes, but problem still exists. It may be that # 94583 has not been fully fixed.

iwankgb commented 1 year ago

@pacoxu is this version of Kubernetes still supported?