k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.04k stars 2.35k forks source link

Built-in K3s Containerd doesn't report OOM events for cgroups v2 #4572

Closed ghost closed 2 years ago

ghost commented 2 years ago

Environmental Info: K3s Version: k3s version v1.21.6+k3s1 (df033fa2) go version go1.16.8

Node(s) CPU architecture, OS, and Version: Linux hostname 5.11.0-1017-aws #18~20.04.1-Ubuntu SMP Fri Aug 27 11:21:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: K3s v1.21.6+k3s1 cluster with 3 servers and 5 agents all servers are using Linux cgroups v2

Describe the bug: When a process inside a pod is killed due to OOM Containerd doesn't report OOM events. It affects only systems which are using cgroups v2 with v1 it works as expected.

Steps To Reproduce:

[Install] WantedBy=multi-user.target

[Service] Type=exec EnvironmentFile=-/etc/default/%N EnvironmentFile=-/etc/sysconfig/%N EnvironmentFile=-/etc/systemd/system/k3s-agent.service.env KillMode=process Delegate=yes

Having non-zero Limit*s causes performance problems due to accounting overhead

in the kernel. We recommend using cgroups to do container-local accounting.

LimitNOFILE=1048576 LimitNPROC=infinity LimitCORE=infinity TasksMax=infinity TimeoutStartSec=0 Restart=always RestartSec=5s ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service' ExecStartPre=-/sbin/modprobe br_netfilter ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/local/bin/k3s \ agent \ '-c' \ '/etc/rancher/k3s/config.yaml' \ '--server' \ 'https://master:6443' \

/etc/rancher/k3s/config.yaml:
```token: TOKEN
no-flannel: true
node-name: HOSTNAME
kubelet-arg:
- eviction-hard=imagefs.available<5%,nodefs.available<5%,memory.available<5%
- eviction-soft=imagefs.available<10%,nodefs.available<10%,memory.available<10%
- eviction-soft-grace-period=imagefs.available=5m,nodefs.available=5m,memory.available=5m
- cloud-provider=external
- "provider-id=aws:///us-east-1b/i-111111111"

node-label:
- "group-name=worker-group"
- "node-type=worker"

Expected behavior: On node where stress-oom-crasher pod is running ctr events should show OOM events, e.g.

2021-11-24 10:59:04.757581973 +0000 UTC k8s.io /tasks/oom {"container_id":"3166ec37d31ee3089e272d6f3261585786fdcdc41d3cda4a3aac3ebd2b324586"}
2021-11-24 10:59:04.75831734 +0000 UTC k8s.io /tasks/oom {"container_id":"75c684a3665b008f1037324c7511150fe6cfad0b14d79d5030fda0130c59478f"}

Actual behavior: There are no OOM events in output of ctr events command

Additional context / logs: I noticed that if run a container manually, e.g. ctr run -t --memory-limit=126000000 docker.io/library/python:3.9.9 test_oom bash and generate OOM then expected /tasks/oom event is shown in output of ctr events. In this case corresponding cgroup is created under /sys/fs/cgroup/k8s.io/ in case if a container is created by k3s corresponding cgroup is created under /sys/fs/cgroup/kubepods/.

Backporting

brandond commented 2 years ago

Have you tested to see if this behavior is unique to our packaging of containerd? Can you reproduce the same behavior with upstream containerd 1.4 when using cgroupv2?

stale[bot] commented 2 years ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.