rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 396 forks source link

k3os v1.20.11+k3s1: constant warnings about failed garbage collection (FreeDiskSpaceFailed and ImageGCFailed) #765

Open vdboor opened 2 years ago

vdboor commented 2 years ago

Version (k3OS / kernel) k3os version v0.20.11-k3s1r1 5.4.0-84-generic #94 SMP Sun Sep 19 04:06:53 UTC 2021 on x86_64

Describe the bug After a recent upgrade pods could not be created. They became "Pending" due to "disk pressure" on the k3os node. However, none of the disks appeared to be full. Restarting the node temporary fixed the problem. Yet the following events are constantly posted in the kubernetes node (via kubectl describe node):

Events:
  Type     Reason               Age                    From     Message
  ----     ------               ----                   ----     -------
  Warning  FreeDiskSpaceFailed  5m26s (x234 over 19h)  kubelet  (combined from similar events): failed to garbage collect required amount of images. Wanted to free 473034752 bytes, but freed 0 bytes
  Warning  ImageGCFailed        26s (x235 over 19h)    kubelet  (combined from similar events): failed to garbage collect required amount of images. Wanted to free 473251840 bytes, but freed 0 bytes

Posted every 5 minutes in /var/log/k3s-service.log:

I0927 14:15:10.134995    2471 image_gc_manager.go:304] [imageGCManager]: Disk usage on image filesystem is at 88% which is over the high threshold (85%). Trying to free 473948160 bytes down to the low threshold (80%).
E0927 14:15:10.140048    2471 kubelet.go:1287] Image garbage collection failed multiple times in a row: failed to garbage collect required amount of images. Wanted to free 473948160 bytes, but freed 0 bytes

so some kind of garbage collection tries to run, but there is nothing to collect despite what the message claims.

Additional context

Using crictl images does not show unused images, crictl rmi --prune doesn't release any images at all.

The only odd observation I have is that crictl images shows various containers with a <none> tag. This happens when the image:tag@sha265:.. format is used. This happens for example with k8s.gcr.io/ingress-nginx/controller:v1.0.1@sha256:26bbd57f32bac3b30f90373005ef669aae324a4de4c19588a13ddba399c6664e

Output from df -h | sort:

# df -h | sort
/dev/loop1       50M   50M     0 100% /usr
/dev/loop2      313M  313M     0 100% /usr/src
/dev/sda1       6.0G  4.9G  758M  87% /
/dev/vda        9.8G  1.9G  7.5G  20% /var/lib/rancher/k3s/storage
Filesystem      Size  Used Avail Use% Mounted on
cgroup_root      10M     0   10M   0% /sys/fs/cgroup
dev              10M     0   10M   0% /dev
none            2.0G  1.8M  2.0G   1% /etc
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/019179dd80a31f8637240905bf008dd318acff85ac6b0724f7cd8c3e4526c2e6/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/0dc9c7dd1eedc0ac3d0a7224ea8675ddc310559533f893d3c34f5a05fdabff35/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/11944fb4cdca3843aa69544c4aa5469a81933bbef92f78db148a3b2dee065237/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/17335c08b022ada487f86267c2830be3cba6c6f94284cdacb471dc211eec682f/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/1843155abdf7b9f5c660eb0e71c0bebddecfcfc0adede51cbf7dd5bc786550b6/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/19bfb5f9461d9b1039a0df94df0bd677171f395b2731c69102b758fe1eb28924/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/262ea578d9577695cb64a008d8f7cadb23fdb6d16c4acb26b52003c4c91e2842/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/2bf8c13f638a0767a1964f730147e8c8aa6ec19c65b62fd1b3a53b45d9f72207/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/3290557d4cd8c2a77095587c684d60315aef97f8763ea51567346bc87aa4b554/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/3465a4ea33a1a82ad2e3847c08a5cf64c97ce2edce90c6e540837794e82f1994/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/36708dee6fdda348c547349e29635c94473d03b71dc4aebdbf19482dfbc35b91/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/3bd39a8bd23bcaa34a33ea3d0aa6cf573e424a7ecf9a3479652a22d14031b1b5/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/4af06c32d280010f9670f4cf4f351884bbf59922d4f16e4cfb6ad246e16fa3b0/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/51ab1f2a02e02958c49f94fdc1bf3c17e2a507f0f3ebb3bdf4db445322f33648/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/56986598cb6e20af8e7e687d721e194e6fe7a457a7c9568ff69a2444eeeafa77/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/5723c85ab51bc3c5a4b01d5bd46cf4291659771a96211ee58ec2446d74cb3af0/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/57b5c69848b3db4d0b09e0525dfe8b60b1ef47c3001f07ebb02f52dd8a8d93b7/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/58c1b7e711682f9a2ef7d45346c83d3feb7bef444c2822c7130c739ae70691d4/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/59747e27a2eaf15e1f3bab383179d80abc9a852f4d80d9bc5020a37253c06453/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/5c0766c0c6ecbc4d128b7fb4dd25a6ca8ac91c0f5c45d442fab10b37dda05927/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/5e288cf23b216f1f982d4ad2d8a22fb3a6cb5625dc6809c3ad3f460c2e252bc1/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/64475a461b90fa806dcfaf91c5395aafac96978a5dbd7db9f1a3287ca6c52f4c/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6a8e8c3e165927c85dc1e7ca2692b9ee7000fa13071ec2b3fb749fca0ba0ebe8/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6b2cda0407d0fc12946aa3c8a113a3e0c7ea917bd45b2906d3241927cebd4f1b/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/7354f262b940a8783b766a93f2cbac2a932a29e80a81c234cb4fa2f207a269d6/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/7aaf677f4729e2730aed0b3bbf00c36e9e6f6db8fdac0070d3ea5f1e09a82e84/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/7eca302b5ce485a39e87bc3e5449f77888fb50e0aa051c78d80ee6e15e953421/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/826ec49c295223f835cbd506ecfb68d3e2f94ed37897e940552b03058424b1df/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/829cc5745637826af8a91cea56ad8cc116c9cd1bd4a2d40fcde080f89a8852c7/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/8377b9f012249765c65f363a5b181869c0372fcf1ae8031800ffd640185aeab6/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/87fe7a4b9248e22ab7d8692bf8cee8144acdd0c209e57bfc73f00dbb9b4e71c3/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/8e52ef3f1e8f69b162c0832e5ca187bd97706cad0fc738f39d2795a165b99a60/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/9cfe3a7c274bfc7ee1b1bf68db71348eec076f6c6a83812ac96bd62808710c23/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/a578de0d8230d3ebace536a4fa0f210cb0df141381e8cd12498cf45fc8839834/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/a958423bca31393c98024bcdb436bda8d682c55595a75cbf46e4c6ab62fb7ce2/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/b92dbb0cb90531f08d60d0d18ffedb08e4278920ea748d140cc8feaabaece23e/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/bae8dc2d1072f4bea0f7d9519dcd996031b15963e06c99990fdd28003b4ab002/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/c0288b1795e1158fa3037c6cd32150649048a1d763b1f5e44c84d06d170343e1/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/c0bc187025af3303224f52aacf6c605f4de8c9b194c4a237334e903be867b1e3/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/cdd9aa1e96d526892e2147fad4e48b980434d62eae0943c30c82204ae731e693/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/d83bfbd3a0d30e66cfd8e9e21e2614d49b8f01684bb7fe066a175059a34b7183/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/e6b3ac97f4570e275be6011e4479af163d73079b9bbc0cb1c8bc3ae32f02db9c/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/efbabbe0454a62a496f105361fcb177a3016fa5b85fb36462aafd7371e32ea4b/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/f69bc59479dd8f62a74e0c2ca981b19270cbc5ef2c5600f3fb5ff66e6bb62c99/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/fa11e646dba2f1eb1a06f995f43cdf6ba1d2585522727667f64c546ebb33994d/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/ffd939ef0d31a86218b9d65b9086d6d0a63c62689a9e9a5d840bafbe85eb33f0/rootfs
overlay         6.0G  4.9G  758M  87% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/ffeb71983922290ad0a81c15c0192a36dfef4ea5a1d3aad8ffa0b537e9001825/rootfs
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/0dc9c7dd1eedc0ac3d0a7224ea8675ddc310559533f893d3c34f5a05fdabff35/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/11944fb4cdca3843aa69544c4aa5469a81933bbef92f78db148a3b2dee065237/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/19bfb5f9461d9b1039a0df94df0bd677171f395b2731c69102b758fe1eb28924/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/262ea578d9577695cb64a008d8f7cadb23fdb6d16c4acb26b52003c4c91e2842/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/2bf8c13f638a0767a1964f730147e8c8aa6ec19c65b62fd1b3a53b45d9f72207/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/3290557d4cd8c2a77095587c684d60315aef97f8763ea51567346bc87aa4b554/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/3465a4ea33a1a82ad2e3847c08a5cf64c97ce2edce90c6e540837794e82f1994/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/3bd39a8bd23bcaa34a33ea3d0aa6cf573e424a7ecf9a3479652a22d14031b1b5/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/56986598cb6e20af8e7e687d721e194e6fe7a457a7c9568ff69a2444eeeafa77/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/57b5c69848b3db4d0b09e0525dfe8b60b1ef47c3001f07ebb02f52dd8a8d93b7/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/5c0766c0c6ecbc4d128b7fb4dd25a6ca8ac91c0f5c45d442fab10b37dda05927/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/64475a461b90fa806dcfaf91c5395aafac96978a5dbd7db9f1a3287ca6c52f4c/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/6a8e8c3e165927c85dc1e7ca2692b9ee7000fa13071ec2b3fb749fca0ba0ebe8/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/6b2cda0407d0fc12946aa3c8a113a3e0c7ea917bd45b2906d3241927cebd4f1b/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/7354f262b940a8783b766a93f2cbac2a932a29e80a81c234cb4fa2f207a269d6/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/7aaf677f4729e2730aed0b3bbf00c36e9e6f6db8fdac0070d3ea5f1e09a82e84/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/7eca302b5ce485a39e87bc3e5449f77888fb50e0aa051c78d80ee6e15e953421/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/829cc5745637826af8a91cea56ad8cc116c9cd1bd4a2d40fcde080f89a8852c7/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/9cfe3a7c274bfc7ee1b1bf68db71348eec076f6c6a83812ac96bd62808710c23/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/c0288b1795e1158fa3037c6cd32150649048a1d763b1f5e44c84d06d170343e1/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/e6b3ac97f4570e275be6011e4479af163d73079b9bbc0cb1c8bc3ae32f02db9c/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/ffd939ef0d31a86218b9d65b9086d6d0a63c62689a9e9a5d840bafbe85eb33f0/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/ffeb71983922290ad0a81c15c0192a36dfef4ea5a1d3aad8ffa0b537e9001825/shm
shm             2.0G     0  2.0G   0% /dev/shm
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/038b7faf-fcee-49ac-bc8e-88de4de12027/volumes/kubernetes.io~secret/nginx-token-xnmrh
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/038b7faf-fcee-49ac-bc8e-88de4de12027/volumes/kubernetes.io~secret/webhook-cert
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/1850269b-a40a-405a-9b3b-5b3b43de819c/volumes/kubernetes.io~secret/default-token-w4bc5
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/22fe2de1-972a-445c-a59b-3862965d95e6/volumes/kubernetes.io~secret/default-token-w4bc5
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/32731262-8d2f-4fe3-a998-7da9d8d11d35/volumes/kubernetes.io~secret/local-path-provisioner-service-account-token-gjm96
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/367f0c17-cc4b-4ee0-ab1e-aae26e22f3b6/volumes/kubernetes.io~secret/default-token-rw5bc
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/5f5de487-4c34-4acc-a8af-25868c9a482a/volumes/kubernetes.io~secret/default-token-f2g9p
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/76b21b1e-b18f-4468-af77-dad259d19a3c/volumes/kubernetes.io~secret/default-token-w4bc5
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/76b21b1e-b18f-4468-af77-dad259d19a3c/volumes/kubernetes.io~secret/tls-cert
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/885a1ee8-6266-4495-86ab-3731d4c6a741/volumes/kubernetes.io~secret/coredns-token-kt7wl
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/ab2dcf30-429f-4e63-9188-36ec58ea4168/volumes/kubernetes.io~secret/default-token-w4bc5
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/bad1863a-48da-4381-a2ad-0c98da7d5bf4/volumes/kubernetes.io~secret/nginx-backend-token-nnkmh
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/bee24ba0-395a-4600-a692-f5190a3fc349/volumes/kubernetes.io~secret/default-token-vdrz8
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/c750fe05-4448-4914-8ed6-58b50082a520/volumes/kubernetes.io~secret/redis-token-v2452
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/ce640c03-2c95-4d42-97bd-0c1dca4fe6a6/volumes/kubernetes.io~empty-dir/writable-dirs
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/ce640c03-2c95-4d42-97bd-0c1dca4fe6a6/volumes/kubernetes.io~secret/default-token-w4bc5
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/d0875571-d9a5-4321-bbda-0414a4577ac5/volumes/kubernetes.io~secret/default-token-w4bc5
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/d3fa0414-2f16-4497-b608-22a68372e269/volumes/kubernetes.io~secret/default-token-w4bc5
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/d646e18e-aa3d-4433-b876-690e1b2bf1b0/volumes/kubernetes.io~secret/k3os-upgrade-token-cn4v6
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/dff174a9-d387-478c-8ba1-bfe0f0f1860c/volumes/kubernetes.io~secret/metrics-server-token-wddz8
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/e02af0bd-792d-42d9-9150-0e6fe197889a/volumes/kubernetes.io~secret/default-token-vdrz8
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/e8e3742a-ec03-40cc-b9ac-85f35afae94c/volumes/kubernetes.io~secret/default-token-f2g9p
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/edf4ce8c-11dc-49da-8076-11dcbef3971d/volumes/kubernetes.io~secret/default-token-w4bc5
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/f107252f-ef71-4e6e-af58-b4424357ecf3/volumes/kubernetes.io~secret/default-token-vdrz8
tmpfs           2.0G   12K  2.0G   1% /var/lib/kubelet/pods/f7d768cf-0b95-4531-ab3c-f963370e50e6/volumes/kubernetes.io~secret/default-token-kg4ck
tmpfs           2.0G   20K  2.0G   1% /var/lib/kubelet/pods/a4329ca0-c1ea-4074-bfd7-88c97e9d48ac/volumes/kubernetes.io~empty-dir/dshm
tmpfs           2.0G   20K  2.0G   1% /var/lib/kubelet/pods/edf4ce8c-11dc-49da-8076-11dcbef3971d/volumes/kubernetes.io~secret/configfiles
tmpfs           2.0G  4.0K  2.0G   1% /var/lib/kubelet/pods/76b21b1e-b18f-4468-af77-dad259d19a3c/volumes/kubernetes.io~secret/auth
tmpfs           2.0G  4.0K  2.0G   1% /var/lib/kubelet/pods/ab2dcf30-429f-4e63-9188-36ec58ea4168/volumes/kubernetes.io~secret/creds
tmpfs           2.0G  4.0K  2.0G   1% /var/lib/kubelet/pods/d0875571-d9a5-4321-bbda-0414a4577ac5/volumes/kubernetes.io~secret/creds
tmpfs           394M     0  394M   0% /tmp
tmpfs           394M  4.6M  389M   2% /run

The K3OS machine runs inside libvirt.

dweomer commented 2 years ago

6G root disk (where your imagefs lives) is kinda small. Since we're talking libvirt here ... can you grow it? Additionally, take a look at https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ for some --kubelet-arg overrides you might consider setting:

--image-gc-high-threshold int32

Default: 85 The percent of disk usage after which image garbage collection is always run. Values must be within the range [0, 100], To disable image garbage collection, set to 100. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

--image-gc-low-threshold int32

Default: 80 The percent of disk usage before which image garbage collection is never run. Lowest disk usage to garbage collect to. Values must be within the range [0, 100] and should not be larger than that of --image-gc-high-threshold. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

--eviction-hard map[String][String]

Default: imagefs.available<15%,memory.available<100Mi,nodefs.available<10% A set of eviction thresholds (e.g. memory.available<1Gi) that if met would trigger a pod eviction. On a Linux node, the default value also includes nodefs.inodesFree<5%. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

--eviction-max-pod-grace-period int32

Maximum allowed grace period (in seconds) to use when terminating pods in response to a soft eviction threshold being met. If negative, defer to pod specified value. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

--eviction-minimum-reclaim map[String][String]

A set of minimum reclaims (e.g. imagefs.available=2Gi) that describes the minimum amount of resource the kubelet will reclaim when performing a pod eviction if that resource is under pressure. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

--eviction-pressure-transition-period duration

Default: 5m0s Duration for which the kubelet has to wait before transitioning out of an eviction pressure condition. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

--eviction-soft map[String][String]

A set of eviction thresholds (e.g. memory.available<1.5Gi) that if met over a corresponding grace period would trigger a pod eviction. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

--eviction-soft-grace-period map[String][String]

A set of eviction grace periods (e.g. memory.available=1m30s) that correspond to how long a soft eviction threshold must hold before triggering a pod eviction. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

vdboor commented 2 years ago

Thanks, I realized the same and have grown the root disk to 10G.

Today I had the same issue again, and all pods became pending (kubectl describe pod shows DiskPressure).

This is the output of df -h inside k3os:

k3os-4172 [~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       9.9G  3.6G  5.9G  38% /
/dev/loop1       50M   50M     0 100% /usr
none            2.0G  1.8M  2.0G   1% /etc
tmpfs           394M  780K  393M   1% /run
tmpfs           394M     0  394M   0% /tmp
dev              10M     0   10M   0% /dev
shm             2.0G     0  2.0G   0% /dev/shm
cgroup_root      10M     0   10M   0% /sys/fs/cgroup
/dev/loop2      313M  313M     0 100% /usr/src
/dev/vda        9.8G  2.1G  7.2G  23% /var/lib/rancher/k3s/storage
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/bc2e9475610d6c0e1cbcb0112bc6afdd9f2e600de44634e45314dfe48d92274b/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/2cc3f4f92240e5e85a3e5cd2bed39eb056d440cd36901d816e8a7cfbe230489d/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/af24de639a5c2177a65d973506b5ff24d4ca749e2327561478512fcd6b67bd4a/shm

Quite clean because all pods were evicted.

all local images were removed by kubelet's gc except a few:

# crictl images
IMAGE                                         TAG                 IMAGE ID            SIZE
docker.io/rancher/k3os                        v0.19.15-k3s2r0     a98005c06a141       527MB
docker.io/rancher/klipper-lb                  v0.2.0              465db341a9e5b       2.71MB
docker.io/rancher/kubectl                     v1.20.11            0824e79fc2dbe       12MB
docker.io/rancher/pause                       3.1                 da86e6ba6ca19       327kB
docker.io/rancher/system-upgrade-controller   v0.7.7-rc.1         a5bdcc6c48038       9.7MB

Rebooting the node and then uncordoning it fixed my problem.

Likely the system upgrade process is causing this. I noticed my node now runs v1.19.15+k3s2, which is a downgrade from v0.20.11-k3s1r1 it used before. Nothing else changed; it's quiet machine with a few low traffic blog sites.

I guess the disk pressure happens during the upgrade process?

dweomer commented 2 years ago

I noticed my node now runs v1.19.15+k3s2, which is a downgrade from v0.20.11-k3s1r1 it used before.

Yeah, this was weird: it seems that github does not calculate the "latest" version in accordance with semver. I had released 0.19, 0.20. amd 0.21 version all about the same time and for a time the 0.19 was showing up as the "latest" until I marked it as a pre-release via github.

I do encourage you to not rely on the built-in k3os-latest plan for "production" systems. I originally designed it to be the "default" way to upgrade your cluster but only if you were willing to accept the potential flakiness implied by it relying on the "latest" tag. To prevent stuff like this from happening in the future you should relabel your nodes such that they do not match the k3os-latest node-selector and write your own plan with a pinned version.

max-wittig commented 2 years ago

Same thing is happening with a k3s machine sadly.

ClapicaVSGon commented 1 year ago

Same thing is happening with my k3s cluster sadly.