rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 397 forks source link

io.containerd.snapshotter.v1.overlayfs -> takes all space #736

Closed linuxmail closed 2 years ago

linuxmail commented 3 years ago

Version (k3OS / kernel) k3os version v0.20.7-k3s1r0

Architecture x86_64

Describe the bug Since upgrading from 0.19-dev5 to v0.20.7-k3s1r0, the file /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs grows pretty fast, which causes to fail the node with disk pressure and a lot of evicted pods.

Expected behavior Do not takes so much disk, which causes "/" to be full

Actual behavior After a few minutes, this file grows, node is under disk pressure and removes the pods, which then reduces this file to a few hundred MB.

Additional context Maybe, it happend fist, after upgrading Rancher to 2.5.9 to support new k3s (otherwise, Rancher isn't able to get status of the cluster / nodes / ... )

Diskspace from every node is 20GB.

dweomer commented 3 years ago

@linuxmail /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs is where containerd sets up overlayfs working directories. This means that your workload(s) are likely filling this up. If you are running into disk pressure with this location "filling up" with ~20GB of data it would seem that your root disk is under-allocated.

dweomer commented 3 years ago

@linuxmail /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs is where containerd sets up overlayfs working directories. This means that your workload(s) are likely filling this up. If you are running into disk pressure with this location "filling up" with ~20GB of data it would seem that your root disk is under-allocated.

That said. If you are only seeing this behavior post-upgrade, considering manually pruning the rancher/k3os images on your node(s).

linuxmail commented 2 years ago

Hi,

it seems, you are right. Sorry for the pretty long delay. I will close the ticket, as I purged all projects and recreated them. The problem is disappeared.