kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.35k stars 1.55k forks source link

Any OOM Kill in the cluster leads to the entire cluster crashing irreparably #3169

Open howardjohn opened 1 year ago

howardjohn commented 1 year ago

What happened:

Whenever an OOM happens in any container in the cluster, the entire cluster crashes and cannot recover.

What you expected to happen:

OOM just kills the impacted container, which is restarted by k8s, etc.

How to reproduce it (as minimally and precisely as possible):

$  kind create cluster --name oom
$ cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo
spec:
  selector:
    matchLabels:
      app: echo
  template:
    metadata:
      labels:
        app: echo
    spec:
      containers:
      - name: echo
        image: gcr.io/istio-testing/app:latest
        resources:
          limits:
            memory: 1Mi
EOF

Wait a minute or so and it will OOM and things will break.

Anything else we need to know?:

dmesg:

[  +0.000002] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-548efed163da10fd7613e82876bc24c3a7f355fa740256659bfdb361519bb640.scope,mems_allowed=0,oom_memcg=/system.slice/docker-b2b75b11f95220f9267e53d48df2d654d585405c36b4c752302ab39da59d74e6.scope/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-burstable.slice/kubelet-kubepods-burstable-pod68e0cbcc_0cf8_48fc_9a5c_bbb45350dd13.slice,task_memcg=/system.slice/docker-b2b75b11f95220f9267e53d48df2d654d585405c36b4c752302ab39da59d74e6.scope/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-burstable.slice/kubelet-kubepods-burstable-pod68e0cbcc_0cf8_48fc_9a5c_bbb45350dd13.slice/cri-containerd-548efed163da10fd7613e82876bc24c3a7f355fa740256659bfdb361519bb640.scope,task=runc:[2:INIT],pid=3451192,uid=0
[  +0.000013] Memory cgroup out of memory: Killed process 3451192 (runc:[2:INIT]) total-vm:1088740kB, anon-rss:20kB, file-rss:7548kB, shmem-rss:0kB, UID:0 pgtables:140kB oom_score_adj:999

Note here its runc init, but the same has happened with kubelet, kindnet, and my own apps.

When it happens the docker container of the control plane restarts. After the restart, kubelet cannot start:

Apr 14 15:00:18 oom-control-plane containerd[110]: time="2023-04-14T15:00:18.601562784Z" level=info msg="Start event monitor"
Apr 14 15:00:18 oom-control-plane containerd[110]: time="2023-04-14T15:00:18.601584007Z" level=info msg="Start snapshots syncer"
Apr 14 15:00:18 oom-control-plane containerd[110]: time="2023-04-14T15:00:18.601590724Z" level=info msg="Start cni network conf syncer for default"
Apr 14 15:00:18 oom-control-plane containerd[110]: time="2023-04-14T15:00:18.601596268Z" level=info msg="Start streaming server"
Apr 14 15:00:19 oom-control-plane systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 1.
Apr 14 15:00:19 oom-control-plane systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Apr 14 15:00:19 oom-control-plane systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Apr 14 15:00:19 oom-control-plane sh[133]: sed: couldn't flush stdout: Device or resource busy
Apr 14 15:00:19 oom-control-plane systemd[1]: kubelet.service: Control process exited, code=exited, status=4/NOPERMISSION
Apr 14 15:00:19 oom-control-plane systemd[1]: kubelet.service: Failed with result 'exit-code'.
Apr 14 15:00:19 oom-control-plane systemd[1]: Failed to start kubelet: The Kubernetes Node Agent.
Apr 14 15:00:20 oom-control-plane systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 2.
Apr 14 15:00:20 oom-control-plane systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Apr 14 15:00:20 oom-control-plane systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Apr 14 15:00:20 oom-control-plane sh[137]: sed: couldn't flush stdout: Device or resource busy
Apr 14 15:00:20 oom-control-plane systemd[1]: kubelet.service: Control process exited, code=exited, status=4/NOPERMISSION
Apr 14 15:00:20 oom-control-plane systemd[1]: kubelet.service: Failed with result 'exit-code'.
Apr 14 15:00:20 oom-control-plane systemd[1]: Failed to start kubelet: The Kubernetes Node Agent.

Environment:

Tested on 2 machines, both infos included below

Server: Containers: 3 Running: 3 Paused: 0 Stopped: 0 Images: 3 Server Version: 20.10.23 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 31aa4358a36870b21a992d3ad2bef29e1d693bec runc version: v1.1.4-0-g5fd4c4d init version: de40ad0 Security Options: apparmor seccomp Profile: default cgroupns Kernel Version: 6.1.15-1rodete3-amd64 Operating System: Debian GNU/Linux rodete OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 31.07GiB Name: howardjohn-glaptop ID: T75S:255Q:OFTL:ZM4Y:BRIG:KCXM:6SG6:FNTW:QN5L:BAQU:SRND:SC5F Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

Client: Context: default Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Docker Buildx (Docker Inc., v0.10.0-docker) scan: Docker Scan (Docker Inc., v0.23.0)

Server: Containers: 4 Running: 3 Paused: 0 Stopped: 1 Images: 47 Server Version: 20.10.23 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux Default Runtime: runc Init Binary: docker-init containerd version: 31aa4358a36870b21a992d3ad2bef29e1d693bec runc version: v1.1.4-0-g5fd4c4d init version: de40ad0 Security Options: apparmor seccomp Profile: default cgroupns Kernel Version: 6.1.15-1rodete3-amd64 Operating System: Debian GNU/Linux rodete OSType: linux Architecture: x86_64 CPUs: 48 Total Memory: 188.9GiB Name: howardjohn.c.googlers.com ID: HG5E:A5RL:NQMU:2QU2:JXFQ:LNV7:TVS3:CXBP:4UX2:EODZ:X62Q:L4U4 Docker Root Dir: /var/lib/docker Debug Mode: false Username: howardjohn Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false



- OS (e.g. from `/etc/os-release`): glinux (~Debian 12)
- Kubernetes version: (use `kubectl version`): 0.26
- Any proxies or other special environment settings?: Nope
aojea commented 1 year ago

@AkihiroSuda do you think we can isolate the nested containers oom-killer ?

BenTheElder commented 1 year ago

We should probably consider oom_adj the core components for starters.

BenTheElder commented 1 year ago

Testing with v0.12.0 (Released March 7th 2022, pre-migration to systemd cgroup driver) reproduces this, so it's not a recent regression at least, for what little that's worth.

BenTheElder commented 1 year ago

xref: https://github.com/kubernetes/kubernetes/pull/117793