Open dMARLAN opened 5 months ago
Unrelated: Realized the doc (https://docs.k3s.io/installation/registry-mirror) incorrectly shows --disable-default-endpoint
when the arg is actually --disable-default-registry-endpoint
sudo k3s ctr images label "${image_name}" "io.cri.containerd.pinned=pinned"
If containerd and the kubelet are pruning pinned images under disk pressure that is a defect in kubelet or containerd, not k3s. They are not supposed to do that. K3s itself does not have any part in this interaction.
I would suggest raising an issue with the upstream projects.
sudo k3s ctr images label "${image_name}" "io.cri.containerd.pinned=pinned"
If containerd and the kubelet are pruning pinned images under disk pressure that is a defect in kubelet or containerd, not k3s. They are not supposed to do that. K3s itself does not have any part in this interaction.
I would suggest raising an issue with the upstream projects.
Thanks, I will.
I noticed on the containerd issue that you're using k3s, but also have containerd installed on the host. Are you using the containerd embedded in k3s, or are you pointing k3s at the containerd socket provided by the host-level installation? We generally recommend against having multiple instances of containerd installed on the host.
@brandond
We import with e.g. sudo k3s ctr images import foo.tar
and we don't point k3s to the host containerd. I'm assuming it is on the host as a dependency for something else (I am testing on my dev machine, in prod they would only have the k3s package, although customers potentially could still have their own containerd/k8s/etc.) I only provided the host version since that's what their instructions ask.
Pruning of unused, unpinned images under disk pressure is a feature, not a bug.
Have you read any of the comments in this thread? Have you tried using the command suggested above to add a pinned label to your images? Have you tried adding storage so that your disk is not 86% full without even running anything?
I also notice:
❯ k3s ctr images list | grep foo ❯ k3s ctr images import --local /tmp/aslkjasdflkj unpacking docker.io/library/foo (sha256:32a742d6afd561b5905e7c69efa4343a739f15ebc5c7ff5f1b8dc2c05bd189ab)...done ❯ k3s ctr images list | grep foo docker.io/library/foo application/vnd.oci.image.manifest.v1+json sha256:32a742d6afd561b5905e7c69efa4343a739f15ebc5c7ff5f1b8dc2c05bd189ab 153.3 MiB linux/amd64 io.cri-containerd.image=managed ❯ # water my plants... ❯ k3s ctr images list | grep foo ❯ # gone :(
and have 14% free disk space, so I guess I'm experiencing this?
Is there a workaround? being able to use k3s with a locally-built image seems like essential functionality
At 85% unused/unpinned (like brandonmd mentioned) will get pruned, if you make sure your images are in use they won't get pruned.
However note that at 95% containers start to get evicted which makes them unused, and then they get pruned regardless of pinning, this might be a bug from containerd, I never got a final answer, decided to just accept it and move on using the embedded registry as a mitigation.
Environmental Info: K3s Version:
Node(s) CPU architecture, OS, and Version:
Linux DVUbuntuVM 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 1 server, 1 agent
Describe the bug: Air-gapped images are deleted when all pods that are using the image are evicted, in this case due to DiskPressure.
Steps To Reproduce:
Installed K3s: (Attempted both with, and without,
--embedded-registry
and--disable-default-endpoint
added to the INSTALL_K3S_EXEC)Import & label airgap images:
Start cluster/apply yamls
Verified with
sudo k3s ctr images ls
that all air-gapped images have the expected labels.Simulate disk pressure (used
dd if=/dev/zero of=dummyfile bs=1M count=1024
and looped until I exceeded theeviction-hard
threshold at >95% disk usage)Monitor
k3s kubectl events
DiskPressure taint is applied to the node.
Pods start being evicted.
Once all pods of any type of image are evicted, that image is garbage collected.
Observe in
sudo k3s ctr images ls
that the air gapped images are gone.Resolve the disk pressure by deleting the dummy files.
DiskPressure taint is removed after 5 minutes.
Pods are rescheduled.
Pods are marked with ErrImageNeverPull (Still had the internet on / pull policy not configured, so postgres/redis/qdrant pulled remotely, all other air-gapped images were lost.)
When describing the pods, it says the image isn't available.
Expected behavior:
Actual behavior:
Additional context / logs: