Closed benp20 closed 3 years ago
The easiest thing to do would probably be to bind-mount your data partition as /run/k3s
Thanks for the suggestion. I mounted the data partition at /run/k3s and now I see overlays for containers getting created at that path (and using the data partition).
Earlier, I was seeing an issue where the node is tainted due to disk pressure and hence the pods are left in a pending state:
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
I had to reduce thresholds to get around this issue (see my k3s launch command below). I don't see the issue anymore. I am guessing it is because it is (somehow) recognize that more space is available in the new partition?
sudo /usr/local/bin/k3s server --kubelet-arg='eviction-soft=nodefs.available<15%' --kubelet-arg='eviction-soft-grace-period=nodefs.available=60m' --kubelet-arg='eviction-hard=nodefs.available<5%' --kubelet-arg='eviction-soft=nodefs.inodesFree<5%' --kubelet-arg='eviction-soft-grace-period=nodefs.inodesFree=120m' --kubelet-arg='eviction-hard=nodefs.inodesFree<5%'
Update: I still see the node reporting diskpressure when I run pods with larger containers even through /run/k3s is mapped to the data partition already. Any advice on how to address this problem through use of the data partition? (seemingly it is still using my root partition somewhere which is quite full)
Based on kubernetes documentation (https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/) seems like the issue is with nodefs.
kubelet supports only two filesystem partitions.
The nodefs filesystem that kubelet uses for volumes, daemon logs, etc.
The imagefs filesystem that container runtimes uses for storing images and container writable layers.
Does this have to be mounted to use the partition, and if so how?
Thanks!
Have you considered just throwing a larger SD card or alternative partition layout at the problem? I run k3s on a couple Pi4b nodes with 32GB SD cards without any special configuration.
I'm not sure this solves anything, but on my CentOS server, k3s container images are stored in /var/run/k3s/containerd/, not /run/k3s/containerd/.
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
Is there any solution for this as of today?
K3s uses containerd in a close to vanilla state.
/var/lib/rancher/k3s/agent/containerd
(all your images, container files etc)/run/k3s/containerd
(scratch space blown up during containerd reboots) Practically, what we want is for everything that can grow to be backed by some large drive.
So say, you have a large mounted drive at /mnt
, this script below does the trick (the Kubernetes nodefs
and imagefs
will both be backed by /mnt
, and so will kubelet
and the local-path
persistent volumes) - so your machine's OS directory stays unbloated:
# =======================================
# Storage prep to "/mnt" drive (~500 GB+)
# =======================================
MNT_DIR="/mnt"
K3S_VERSION="v1.25.4+k3s1"
# nodefs
#
KUBELET_DIR="${MNT_DIR}/kubelet"
sudo mkdir -p "${KUBELET_DIR}"
# imagefs: containerd has a root and state directory
#
# - https://github.com/containerd/containerd/blob/main/docs/ops.md#base-configuration
#
# containerd root -> /var/lib/rancher/k3s/agent/containerd
#
CONTAINERD_ROOT_DIR_OLD="/var/lib/rancher/k3s/agent"
CONTAINERD_ROOT_DIR_NEW="${MNT_DIR}/containerd-root/containerd"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_NEW}"
sudo ln -s "${CONTAINERD_ROOT_DIR_NEW}" "${CONTAINERD_ROOT_DIR_OLD}"
# containerd state -> /run/k3s/containerd
#
CONTAINERD_STATE_DIR_OLD="/run/k3s"
CONTAINERD_STATE_DIR_NEW="${MNT_DIR}/containerd-state/containerd"
sudo mkdir -p "${CONTAINERD_STATE_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_STATE_DIR_NEW}"
sudo ln -s "${CONTAINERD_STATE_DIR_NEW}" "${CONTAINERD_STATE_DIR_OLD}"
# pvs -> /var/lib/rancher/k3s/storage
#
PV_DIR_OLD="/var/lib/rancher/k3s"
PV_DIR_NEW="${MNT_DIR}/local-path-provisioner/storage"
sudo mkdir -p "${PV_DIR_OLD}"
sudo mkdir -p "${PV_DIR_NEW}"
sudo ln -s "${PV_DIR_NEW}" "${PV_DIR_OLD}"
# =======
# Install
# =======
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="$K3S_VERSION" INSTALL_K3S_EXEC="--kubelet-arg "root-dir=$KUBELET_DIR"" sh -
sudo chmod 644 /etc/rancher/k3s/k3s.yaml
When kubernetes comes up, you see everything is backed with a lot of space (in my case, /mnt
has 600 GB) - so nodefs
and imagefs
is nice and full:
And nodefs
and imagefs
are at 99% - meaning Eviction Manager will not fire under normal circumstances:
@mdrakiburrahman your way of doing things looks quite cool. However, why not used the documented approach by the K3s project: https://docs.k3s.io/advanced#configuring-containerd - if anything else now you have it for reference.
Have a great day
@mdrakiburrahman your way of doing things looks quite cool. However, why not used the documented approach by the K3s project: https://docs.k3s.io/advanced#configuring-containerd - if anything else now you have it for reference.
Have a great day
Seemed easier to use ln
, it's simple and effective tech 😄
Besides K3s, my team manages a bunch of other Kubernetes flavors that doesn't have Rancher's toml file. ln
works everywhere.
@mdrakiburrahman totally fair! I agree with your points. I'm reaching out on Rancher Slack, on the K3s channel, to see if someone from the K3s project can elaborate more on the somewhat documented approach.
Thanks
I personally would probably just set up another mount point and symlink things into place (as @mdrakiburrahman has done) instead of modifying the containerd config template. If you provide your own config template, then you're responsible for keeping it up to date with any changes we make to the default template. We don't do that often, but I feel like it's more fragile than a couple symlinks.
@brandond and @mdrakiburrahman with your input I'm going with the symlink approach. Thank you very much. Low-key Linux conf. FTW once again 👍🏿 .... have a great day.
Does the --data-dir
flag on installation not set the storage path for all k3s resources, including container images?
@hauks96 I would be eager to know this as well ... whether or not this is the case. @brandond are we going of the beaten path here on this one. In regards to going the prolonged symlink approach - when we ( maybe ) - could just go by the way of using the --data-dir
argument on the K3s
worker/agent process?
Thank you to you both.
Tried it and I'm getting
Feb 01 16:12:31 test-test-worker-29 k3s[1936]: E0201 16:12:31.551646 1936 cri_stats_provider.go:452] "Failed to get the info of the filesystem with mountpoint" err="failed to get device for dir \"/k3s-worker-data/agent/containerd/io.containerd.snapshotter.v1.overlayfs\": stat failed on /k3s-worker-data/agent/containerd/io.containerd.snapshotter.v1.overlayfs with error: no such file or directory" mountpoint="/k3s-worker-data/agent/containerd/io.containerd.snapshotter.v1.overlayfs"
The above might be because there's a timing issue. The LVM2 whereon /k3s-worker-data
is mounted is being created as K3s
is being installed.
Verifying whether or not that's the case.
data-dir
just relocates /var/lib/rancher/k3s. Other things like the runtime directories for the kubelet, containerd, cni, pod logs and so on are essentially hardcoded and will break other things in the ecosystem if they are changed so we do not.
@brandond thank you. So the symlink approach is clearly the way to go. Thank you.
@mdrakiburrahman do you use Longhorn
or perhaps see the below issue with whatever CSI
you use?
When we use the --kubelet-arg root-dir
option on the K3s
binary Kubelet
data goes into the defined path. However, /var/lib/kubelet/plugins/
still contains the driver.longhorn.io folder. Which then causes the following error on StatefulSet
workloads: AttachVolume.Attach failed for volume "pvc-ID" : CSINode NODE_NAME does not contain driver driver.longhorn.io
.
Any idea? Thank you.
It was the Longhorn CSI that needed to have csi.kubeletRootDir set in the values.yaml Helm file and the the longhorn-csi-plugin
DaemonSet
had to be re-deployed for that to go into full effect. Then Longhorn was able registrar the Longhorn driver in the correct kubelet folder ...
Is there a proper solution to this without using ln ?
I can see from systemctl status k3s-agent
that it runs the containerd with parameters --state and --root
Group: /system.slice/k3s-agent.service
├─19541 /usr/local/bin/k3s agent
├─19571 containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd
└─19721 /sbin/iptables -t nat -S FLANNEL-POSTRTG 1 --wait
But where is this defined ? Its not in /etc/systemd/system/k3s-agent.service
It's hardcoded, sorry.
The root dir will be relocated as part of setting --data-dir but the state dir cannot be changed.
The root dir will be relocated as part of setting --data-dir but the state dir cannot be changed.
The documentation on K3S.io specifically states that --data-dir
changes the location where the state is kept. Might want to have that changed to reflect in more detail what is actually kept there.
So what exactly is being kept there if not the containers, state, cni nor logs? I have my worker nodes configured with a --data-dir
flag. The nodes have a 16GB root volume, so I was really banking on the volume mount approach, mainly for the ease of resizing it and adding a new one.
# --data-dir /data/docker/rancher/k3s
$ ls /data/docker/rancher/k3s
agent data
$ sudo du -sh /data/docker/rancher/k3s/agent
9.5G /data/docker/rancher/k3s/agent
$ sudo du -sh /data/docker/rancher/k3s/data
208M /data/docker/rancher/k3s/data
What I really want is to prevent data from being stored on the root volume, because I do not want a volume that cannot be reconfigured to fill up, forcing me to create a new VM. What are the options here?
So what you need @hauks96
Is to use the approach outlined by @mdrakiburrahman. So:
--kubelet-arg root-dir
to move the kubelet
dir to the dedicated LVM2/ZFS mountkubelet
dir. to the dedicated LVM2/ZFS disk. So the kubelet is no longer at the default /var/lib/kubelet
.You can reach out to me on Slack. Either the Kubernetes community or the Rancher one. Where I'm: Lars Bingchong / Lead DevOps Engineer.
N.B. the --data-dir
seems to be specific K3s
related data. So meta-data it needs for generated certs, certs itself and the like.
Although it seems to be working in general after this change, I am getting permission errors for resources within the cluster trying to create or access certain files now.
Should I just chmod 777 my mount directory?
I don't think so @hauks96 ... that should indeed not be necessary. Are you bumping into some inherited permissions causing this? Is it stateuful workloads seeing this or emptyDir consuming ones - or just in general?
Yeah I figured. The problem was existing PVC's in the cluster that had to be deleted. Thanks again
add root =
and state =
in /var/lib/rancher/k3s/agent/etc/containerd/config.toml
should be able to set custom path for k3s integrated containerd's images and state
add
root =
andstate =
in/var/lib/rancher/k3s/agent/etc/containerd/config.toml
should be able to set custom path for k3s integrated containerd's images and state
This won't work as the file is overwritten when the k3s service starts.
Be warned that modifying containerd storage location as suggested by @mdrakiburrahman (and others) can break Kubevirt - see https://github.com/kubevirt/kubevirt/issues/10703#issuecomment-1843863836
EDIT: I also tried using bind mounts rather than symlink, but still had issues
Thank you for pointing that out @ianb-mp. We don't use KubeVirt so we have no issues. Isn't it also possible to configure KubeVirt so that it "knows" where containerd data and conf. is? At least it should be.
How was your problem solved?
K3s uses containerd in a close to vanilla state.
- containerd root:
/var/lib/rancher/k3s/agent/containerd
(all your images, container files etc)- containerd state:
/run/k3s/containerd
(scratch space blown up during containerd reboots)Practically, what we want is for everything that can grow to be backed by some large drive.
So say, you have a large mounted drive at
/mnt
, this script below does the trick (the Kubernetesnodefs
andimagefs
will both be backed by/mnt
, and so willkubelet
and thelocal-path
persistent volumes) - so your machine's OS directory stays unbloated:# ======================================= # Storage prep to "/mnt" drive (~500 GB+) # ======================================= MNT_DIR="/mnt" K3S_VERSION="v1.25.4+k3s1" # nodefs # KUBELET_DIR="${MNT_DIR}/kubelet" sudo mkdir -p "${KUBELET_DIR}" # imagefs: containerd has a root and state directory # # - https://github.com/containerd/containerd/blob/main/docs/ops.md#base-configuration # # containerd root -> /var/lib/rancher/k3s/agent/containerd # CONTAINERD_ROOT_DIR_OLD="/var/lib/rancher/k3s/agent" CONTAINERD_ROOT_DIR_NEW="${MNT_DIR}/containerd-root/containerd" sudo mkdir -p "${CONTAINERD_ROOT_DIR_OLD}" sudo mkdir -p "${CONTAINERD_ROOT_DIR_NEW}" sudo ln -s "${CONTAINERD_ROOT_DIR_NEW}" "${CONTAINERD_ROOT_DIR_OLD}" # containerd state -> /run/k3s/containerd # CONTAINERD_STATE_DIR_OLD="/run/k3s" CONTAINERD_STATE_DIR_NEW="${MNT_DIR}/containerd-state/containerd" sudo mkdir -p "${CONTAINERD_STATE_DIR_OLD}" sudo mkdir -p "${CONTAINERD_STATE_DIR_NEW}" sudo ln -s "${CONTAINERD_STATE_DIR_NEW}" "${CONTAINERD_STATE_DIR_OLD}" # pvs -> /var/lib/rancher/k3s/storage # PV_DIR_OLD="/var/lib/rancher/k3s" PV_DIR_NEW="${MNT_DIR}/local-path-provisioner/storage" sudo mkdir -p "${PV_DIR_OLD}" sudo mkdir -p "${PV_DIR_NEW}" sudo ln -s "${PV_DIR_NEW}" "${PV_DIR_OLD}" # ======= # Install # ======= curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="$K3S_VERSION" INSTALL_K3S_EXEC="--kubelet-arg "root-dir=$KUBELET_DIR"" sh - sudo chmod 644 /etc/rancher/k3s/k3s.yaml
When kubernetes comes up, you see everything is backed with a lot of space (in my case,
/mnt
has 600 GB) - sonodefs
andimagefs
is nice and full:And
nodefs
andimagefs
are at 99% - meaning Eviction Manager will not fire under normal circumstances:
Hello, I would like to understand how the Eviction Thresholds in the second image were derived. I would like to verify if my configuration is taking effect. Could you please provide more details on this? Thanks a lot!
@codeReaper2001 - full writeup and script here: https://www.rakirahman.me/conquering-eviction-manager-k8s/
@codeReaper2001 - full writeup and script here: https://www.rakirahman.me/conquering-eviction-manager-k8s/
Thanks a lot, it worked!
Environmental Info: K3s Version: k3s version v1.18.4+k3s1 (97b7a0e9)
Node(s) CPU architecture, OS, and Version: Linux nvidia-desktop 4.9.140-tegra #1 SMP PREEMPT Wed Apr 8 18:15:20 PDT 2020 aarch64 aarch64 aarch64 GNU/Linux
Cluster Configuration: 1 master
Describe the bug: I'd like to change the default path where containerd under k3s stores container related images, state etc (bydefault /run/k3s/containerd/) since my root partition does not have enough spare space. I;d like to use my data partition instead. What is the recommended procedure for doing so with k3s?
I tried referring to https://rancher.com/docs/k3s/latest/en/advanced/#configuring-containerd, but did not find the information there.
Steps To Reproduce:
Expected behavior:
Actual behavior:
Additional context / logs: