Open ianb-mp opened 6 months ago
What's the cgroups/cpumanager related config that you provide to the other k8s distros? I suspect there are differences from the config used by k0s due to the aforementioned #4319. One way to pin this down to these settings would be to try to copying k0s' settings to the other distros and see if they start failing as well, then.
What's the cgroups/cpumanager related config that you provide to the other k8s distros?
Not sure if this answers your question, but I'm passing these kubelet args to k0s and same/similar to k3s + rke2 (where I was testing previously):
--kubelet-extra-args='--cpu-manager-policy=static
--kube-reserved=cpu=1000m,memory=2000Mi
--system-reserved=cpu=500m,memory=1000Mi
--memory-manager-policy=Static
--topology-manager-policy=restricted
--topology-manager-scope=pod
--reserved-memory=0:memory=1550Mi;1:memory=1550Mi'
One way to pin this down to these settings would be to try to copying k0s' settings to the other distros and see if they start failing as well, then.
By "k0s' settings" are you referring to cgroups? I'm not too familiar with how to check/set those - could you provide some specific instruction?
I've started a k0s worker with the flags you provided. This is how k0s starts kubelet and containerd:
k0s-4417-worker-0:~$ xargs -n1 -0 < /proc/$(pidof k0s)/cmdline
/usr/local/bin/k0s
worker
--data-dir=/var/lib/k0s
--kubelet-extra-args=--cpu-manager-policy=static --kube-reserved=cpu=1000m,memory=2000Mi --system-reserved=cpu=500m,memory=1000Mi --memory-manager-policy=Static --topology-manager-policy=restricted --topology-manager-scope=pod --reserved-memory=0:memory=1550Mi;1:memory=1550Mi
--token-file=/etc/k0s/k0stoken
k0s-4417-worker-0:~$ sudo /usr/local/bin/k0s version
v1.30.0+k0s.0
k0s-4417-worker-0:~$ xargs -n1 -0 < /proc/$(pidof kubelet)/cmdline
/var/lib/k0s/bin/kubelet
--root-dir=/var/lib/k0s/kubelet
--cpu-manager-policy=static
--kube-reserved=cpu=1000m,memory=2000Mi
--system-reserved=cpu=500m,memory=1000Mi
--reserved-memory=0:memory=1550Mi;1:memory=1550Mi
--config=/var/lib/k0s/kubelet-config.yaml
--kubeconfig=/var/lib/k0s/kubelet.conf
--containerd=/run/k0s/containerd.sock
--memory-manager-policy=Static
--v=1
--topology-manager-policy=restricted
--topology-manager-scope=pod
--runtime-cgroups=/system.slice/containerd.service
--cert-dir=/var/lib/k0s/kubelet/pki
k0s-4417-worker-0:~$ xargs -n1 -0 < /proc/$(pidof containerd)/cmdline
/var/lib/k0s/bin/containerd
--root=/var/lib/k0s/containerd
--state=/run/k0s/containerd
--address=/run/k0s/containerd.sock
--log-level=info
--config=/etc/k0s/containerd.toml
You might want to compare these settings to the other distros. If you aren't using NLLB in your cluster, you can also stop the k0s worker and then start containerd and kubelet manually with the above flags. Then you can experiment which settings make it behave badly. The hardcoded non-overrideable settings in k0s are kubeReservedCgroup: system.slice
and kubeletCgroups: /system.slice/containerd.service
in the kubelet configuration file, and the --runtime-cgroups=/system.slice/containerd.service
kubelet CLI flag.
Thanks for the example commands. I've compared your output against my k0s worker and it seems to match up - couldn't see any differences that look relevant to the issue.
I've also run the same/similar commands on my other host which is running rke2 (single node controller/worker):
[root@bne-lab-vr-2 ~]# rke2 -v
rke2 version v1.29.3+rke2r1 (1c82f7ed292c4ac172692bb82b13d20733909804)
go version go1.21.8 X:boringcrypto
[root@bne-lab-vr-2 ~]# cat /etc/rancher/rke2/config.yaml
cni:
- multus
- canal
kubelet-arg:
- "cpu-manager-policy=static"
- "kube-reserved=cpu=1000m,memory=2000Mi"
- "system-reserved=cpu=500m,memory=1000Mi"
- "memory-manager-policy=Static"
- "topology-manager-policy=restricted"
- "topology-manager-scope=pod"
- "reserved-memory=0:memory=1500Mi;1:memory=1500Mi"
disable:
- rke2-snapshot-controller
- rke2-snapshot-controller-crd
- rke2-snapshot-validation-webhook
[root@bne-lab-vr-2 ~]# xargs -n1 -0 < /proc/$(pidof kubelet)/cmdline
kubelet
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
--file-check-frequency=5s
--sync-frequency=30s
--address=0.0.0.0
--anonymous-auth=false
--authentication-token-webhook=true
--authorization-mode=Webhook
--cgroup-driver=systemd
--client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt
--cloud-provider=external
--cluster-dns=10.43.0.10
--cluster-domain=cluster.local
--container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock
--containerd=/run/k3s/containerd/containerd.sock
--cpu-manager-policy=static
--eviction-hard=imagefs.available<5%,nodefs.available<5%
--eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10%
--fail-swap-on=false
--feature-gates=CloudDualStackNodeIPs=true
--healthz-bind-address=127.0.0.1
--hostname-override=bne-lab-vr-2.i.megaport.com
--kube-reserved=cpu=1000m,memory=2000Mi
--kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig
--memory-manager-policy=Static
--node-ip=10.8.55.32
--node-labels=
--pod-infra-container-image=index.docker.io/rancher/mirrored-pause:3.6
--pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests
--read-only-port=0
--reserved-memory=0:memory=1500Mi;1:memory=1500Mi
--resolv-conf=/etc/resolv.conf
--serialize-image-pulls=false
--system-reserved=cpu=500m,memory=1000Mi
--tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt
--tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
--topology-manager-policy=restricted
--topology-manager-scope=pod
[root@bne-lab-vr-2 ~]# xargs -n1 -0 < /proc/$(pidof containerd)/cmdline
containerd
-c
/var/lib/rancher/rke2/agent/etc/containerd/config.toml
-a
/run/k3s/containerd/containerd.sock
--state
/run/k3s/containerd
--root
/var/lib/rancher/rke2/agent/containerd
Something else that I thought may be relevant is to compare the systemd service file for each:
If you aren't using NLLB in your cluster, you can also stop the k0s worker and then start containerd and kubelet manually with the above flags. Then you can experiment which settings make it behave badly. The hardcoded non-overrideable settings in k0s are kubeReservedCgroup: system.slice and kubeletCgroups: /system.slice/containerd.service in the kubelet configuration file, and the --runtime-cgroups=/system.slice/containerd.service kubelet CLI flag.
This is getting beyond my experience/capability 😮
I've tried latest k0s v1.30.1+k0s
and also latest Kubevirt v1.2.1
but the issue persists.
The issue is marked as stale since no activity has been recorded in 30 days
Maybe there will be time to sort this out at some point...
The issue is marked as stale since no activity has been recorded in 30 days
The issue is marked as stale since no activity has been recorded in 30 days
The issue is marked as stale since no activity has been recorded in 30 days
The issue is marked as stale since no activity has been recorded in 30 days
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.30.0+k0s.0
Sysinfo
k0s sysinfo
What happened?
I'm using Kubevirt to run VMs with k0s. I've found that certain Kubevirt configuration causes the 'virt-launcher' pod to fail to terminate - it gets stuck in 'Terminating' state e.g.
By trial and error, I've discovered that the specific configuration that causes this problem is a Kubevirt feature
isolateEmulatorThread
documented here: https://kubevirt.io/user-guide/virtual_machines/dedicated_cpu_resources/#requesting-dedicated-cpu-for-qemu-emulatorWhen I set
isolateEmulatorThread: true
the problem occurs. Note: this setting is used in conjunction withdedicatedCpuPlacement: true
however if I specify only the latter, the issue does not occur.This problem only seems to happen with k0s (I've tested against other k8s distros and not seen this problem).
Steps to reproduce
isolateEmulatorThread: true
VirtualMachine
k8s resourceExpected behavior
The VM should terminate and all associated resources should be removed from k8s.
Actual behavior
No response
Screenshots and logs
k0scontroller log output: https://gist.github.com/ianb-mp/588ef41ec05e695bc183c61726257278#file-k0scontroller-log k0sworker log output: https://gist.github.com/ianb-mp/588ef41ec05e695bc183c61726257278#file-k0sworker-log
Here is an example minimal VM manifest to reproduce the issue:
VM.yaml
```yaml apiVersion: kubevirt.io/v1 kind: VirtualMachine metadata: creationTimestamp: null name: testvm spec: runStrategy: Always template: metadata: creationTimestamp: null spec: domain: cpu: dedicatedCpuPlacement: true isolateEmulatorThread: true resources: requests: memory: "500Mi" devices: disks: - disk: bus: virtio name: containerdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} name: default rng: {} machine: type: "" networks: - name: default pod: {} terminationGracePeriodSeconds: 10 volumes: - name: containerdisk containerDisk: image: quay.io/containerdisks/fedora:40 - name: cloudinitdisk cloudInitNoCloud: userData: |- #cloud-config password: fedora chpasswd: { expire: False } ```Additional context
This may be related to https://github.com/k0sproject/k0s/issues/4319 as it involves k8s CPU Manager