k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.95k stars 2.34k forks source link

using custom path for storing container images and state #2068

Closed benp20 closed 3 years ago

benp20 commented 4 years ago

Environmental Info: K3s Version: k3s version v1.18.4+k3s1 (97b7a0e9)

Node(s) CPU architecture, OS, and Version: Linux nvidia-desktop 4.9.140-tegra #1 SMP PREEMPT Wed Apr 8 18:15:20 PDT 2020 aarch64 aarch64 aarch64 GNU/Linux

Cluster Configuration: 1 master

Describe the bug: I'd like to change the default path where containerd under k3s stores container related images, state etc (bydefault /run/k3s/containerd/) since my root partition does not have enough spare space. I;d like to use my data partition instead. What is the recommended procedure for doing so with k3s?

I tried referring to https://rancher.com/docs/k3s/latest/en/advanced/#configuring-containerd, but did not find the information there.

Steps To Reproduce:

Expected behavior:

Actual behavior:

Additional context / logs:

brandond commented 4 years ago

The easiest thing to do would probably be to bind-mount your data partition as /run/k3s

benp20 commented 4 years ago

Thanks for the suggestion. I mounted the data partition at /run/k3s and now I see overlays for containers getting created at that path (and using the data partition).

Earlier, I was seeing an issue where the node is tainted due to disk pressure and hence the pods are left in a pending state: Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.

I had to reduce thresholds to get around this issue (see my k3s launch command below). I don't see the issue anymore. I am guessing it is because it is (somehow) recognize that more space is available in the new partition?

sudo /usr/local/bin/k3s server --kubelet-arg='eviction-soft=nodefs.available<15%' --kubelet-arg='eviction-soft-grace-period=nodefs.available=60m' --kubelet-arg='eviction-hard=nodefs.available<5%' --kubelet-arg='eviction-soft=nodefs.inodesFree<5%' --kubelet-arg='eviction-soft-grace-period=nodefs.inodesFree=120m' --kubelet-arg='eviction-hard=nodefs.inodesFree<5%'

benp20 commented 4 years ago

Update: I still see the node reporting diskpressure when I run pods with larger containers even through /run/k3s is mapped to the data partition already. Any advice on how to address this problem through use of the data partition? (seemingly it is still using my root partition somewhere which is quite full)

Based on kubernetes documentation (https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/) seems like the issue is with nodefs.

kubelet supports only two filesystem partitions.

The nodefs filesystem that kubelet uses for volumes, daemon logs, etc.
The imagefs filesystem that container runtimes uses for storing images and container writable layers.

Does this have to be mounted to use the partition, and if so how?

Thanks!

brandond commented 4 years ago

Have you considered just throwing a larger SD card or alternative partition layout at the problem? I run k3s on a couple Pi4b nodes with 32GB SD cards without any special configuration.

jeroenjacobs79 commented 4 years ago

I'm not sure this solves anything, but on my CentOS server, k3s container images are stored in /var/run/k3s/containerd/, not /run/k3s/containerd/.

stale[bot] commented 3 years ago

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

sourcehawk commented 2 years ago

Is there any solution for this as of today?

mdrakiburrahman commented 1 year ago

K3s uses containerd in a close to vanilla state.

Practically, what we want is for everything that can grow to be backed by some large drive.

So say, you have a large mounted drive at /mnt, this script below does the trick (the Kubernetes nodefs and imagefs will both be backed by /mnt, and so will kubelet and the local-path persistent volumes) - so your machine's OS directory stays unbloated:

# =======================================
# Storage prep to "/mnt" drive (~500 GB+)
# =======================================
MNT_DIR="/mnt"
K3S_VERSION="v1.25.4+k3s1"

# nodefs
#
KUBELET_DIR="${MNT_DIR}/kubelet"
sudo mkdir -p "${KUBELET_DIR}"

# imagefs: containerd has a root and state directory
#
# - https://github.com/containerd/containerd/blob/main/docs/ops.md#base-configuration
#
# containerd root -> /var/lib/rancher/k3s/agent/containerd
#
CONTAINERD_ROOT_DIR_OLD="/var/lib/rancher/k3s/agent"
CONTAINERD_ROOT_DIR_NEW="${MNT_DIR}/containerd-root/containerd"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_NEW}"
sudo ln -s "${CONTAINERD_ROOT_DIR_NEW}" "${CONTAINERD_ROOT_DIR_OLD}"

# containerd state -> /run/k3s/containerd
#
CONTAINERD_STATE_DIR_OLD="/run/k3s"
CONTAINERD_STATE_DIR_NEW="${MNT_DIR}/containerd-state/containerd"
sudo mkdir -p "${CONTAINERD_STATE_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_STATE_DIR_NEW}"
sudo ln -s "${CONTAINERD_STATE_DIR_NEW}" "${CONTAINERD_STATE_DIR_OLD}"

# pvs -> /var/lib/rancher/k3s/storage
#
PV_DIR_OLD="/var/lib/rancher/k3s"
PV_DIR_NEW="${MNT_DIR}/local-path-provisioner/storage"
sudo mkdir -p "${PV_DIR_OLD}"
sudo mkdir -p "${PV_DIR_NEW}"
sudo ln -s "${PV_DIR_NEW}" "${PV_DIR_OLD}"

# =======
# Install
# =======
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="$K3S_VERSION" INSTALL_K3S_EXEC="--kubelet-arg "root-dir=$KUBELET_DIR"" sh -
sudo chmod 644 /etc/rancher/k3s/k3s.yaml

When kubernetes comes up, you see everything is backed with a lot of space (in my case, /mnt has 600 GB) - so nodefs and imagefs is nice and full:

image

And nodefs and imagefs are at 99% - meaning Eviction Manager will not fire under normal circumstances: image

LarsBingBong commented 1 year ago

@mdrakiburrahman your way of doing things looks quite cool. However, why not used the documented approach by the K3s project: https://docs.k3s.io/advanced#configuring-containerd - if anything else now you have it for reference.

Have a great day

mdrakiburrahman commented 1 year ago

@mdrakiburrahman your way of doing things looks quite cool. However, why not used the documented approach by the K3s project: https://docs.k3s.io/advanced#configuring-containerd - if anything else now you have it for reference.

Have a great day

Seemed easier to use ln, it's simple and effective tech 😄

Besides K3s, my team manages a bunch of other Kubernetes flavors that doesn't have Rancher's toml file. ln works everywhere.

LarsBingBong commented 1 year ago

@mdrakiburrahman totally fair! I agree with your points. I'm reaching out on Rancher Slack, on the K3s channel, to see if someone from the K3s project can elaborate more on the somewhat documented approach.

Thanks

brandond commented 1 year ago

I personally would probably just set up another mount point and symlink things into place (as @mdrakiburrahman has done) instead of modifying the containerd config template. If you provide your own config template, then you're responsible for keeping it up to date with any changes we make to the default template. We don't do that often, but I feel like it's more fragile than a couple symlinks.

LarsBingBong commented 1 year ago

@brandond and @mdrakiburrahman with your input I'm going with the symlink approach. Thank you very much. Low-key Linux conf. FTW once again 👍🏿 .... have a great day.

sourcehawk commented 1 year ago

Does the --data-dir flag on installation not set the storage path for all k3s resources, including container images?

LarsBingBong commented 1 year ago

@hauks96 I would be eager to know this as well ... whether or not this is the case. @brandond are we going of the beaten path here on this one. In regards to going the prolonged symlink approach - when we ( maybe ) - could just go by the way of using the --data-dir argument on the K3s worker/agent process?

Thank you to you both.

LarsBingBong commented 1 year ago

Tried it and I'm getting

Feb 01 16:12:31 test-test-worker-29 k3s[1936]: E0201 16:12:31.551646    1936 cri_stats_provider.go:452] "Failed to get the info of the filesystem with mountpoint" err="failed to get device for dir \"/k3s-worker-data/agent/containerd/io.containerd.snapshotter.v1.overlayfs\": stat failed on /k3s-worker-data/agent/containerd/io.containerd.snapshotter.v1.overlayfs with error: no such file or directory" mountpoint="/k3s-worker-data/agent/containerd/io.containerd.snapshotter.v1.overlayfs"

The above might be because there's a timing issue. The LVM2 whereon /k3s-worker-data is mounted is being created as K3s is being installed.

Verifying whether or not that's the case.

brandond commented 1 year ago

data-dir just relocates /var/lib/rancher/k3s. Other things like the runtime directories for the kubelet, containerd, cni, pod logs and so on are essentially hardcoded and will break other things in the ecosystem if they are changed so we do not.

LarsBingBong commented 1 year ago

@brandond thank you. So the symlink approach is clearly the way to go. Thank you.

LarsBingBong commented 1 year ago

@mdrakiburrahman do you use Longhorn or perhaps see the below issue with whatever CSI you use?

When we use the --kubelet-arg root-dir option on the K3s binary Kubelet data goes into the defined path. However, /var/lib/kubelet/plugins/ still contains the driver.longhorn.io folder. Which then causes the following error on StatefulSet workloads: AttachVolume.Attach failed for volume "pvc-ID" : CSINode NODE_NAME does not contain driver driver.longhorn.io.

Any idea? Thank you.

larssb commented 1 year ago

It was the Longhorn CSI that needed to have csi.kubeletRootDir set in the values.yaml Helm file and the the longhorn-csi-plugin DaemonSet had to be re-deployed for that to go into full effect. Then Longhorn was able registrar the Longhorn driver in the correct kubelet folder ...

VladoPortos commented 1 year ago

Is there a proper solution to this without using ln ? I can see from systemctl status k3s-agent that it runs the containerd with parameters --state and --root

Group: /system.slice/k3s-agent.service
           ├─19541 /usr/local/bin/k3s agent
           ├─19571 containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd
           └─19721 /sbin/iptables -t nat -S FLANNEL-POSTRTG 1 --wait

But where is this defined ? Its not in /etc/systemd/system/k3s-agent.service

brandond commented 1 year ago

It's hardcoded, sorry.

The root dir will be relocated as part of setting --data-dir but the state dir cannot be changed.

sourcehawk commented 1 year ago

The root dir will be relocated as part of setting --data-dir but the state dir cannot be changed.

The documentation on K3S.io specifically states that --data-dir changes the location where the state is kept. Might want to have that changed to reflect in more detail what is actually kept there.

So what exactly is being kept there if not the containers, state, cni nor logs? I have my worker nodes configured with a --data-dir flag. The nodes have a 16GB root volume, so I was really banking on the volume mount approach, mainly for the ease of resizing it and adding a new one.

# --data-dir /data/docker/rancher/k3s

$ ls /data/docker/rancher/k3s
agent  data

$ sudo du -sh /data/docker/rancher/k3s/agent
9.5G    /data/docker/rancher/k3s/agent

$ sudo du -sh /data/docker/rancher/k3s/data
208M    /data/docker/rancher/k3s/data

What I really want is to prevent data from being stored on the root volume, because I do not want a volume that cannot be reconfigured to fill up, forcing me to create a new VM. What are the options here?

LarsBingBong commented 1 year ago

So what you need @hauks96

Is to use the approach outlined by @mdrakiburrahman. So:

You can reach out to me on Slack. Either the Kubernetes community or the Rancher one. Where I'm: Lars Bingchong / Lead DevOps Engineer.

N.B. the --data-dir seems to be specific K3s related data. So meta-data it needs for generated certs, certs itself and the like.

sourcehawk commented 1 year ago

Although it seems to be working in general after this change, I am getting permission errors for resources within the cluster trying to create or access certain files now.

Should I just chmod 777 my mount directory?

LarsBingBong commented 1 year ago

I don't think so @hauks96 ... that should indeed not be necessary. Are you bumping into some inherited permissions causing this? Is it stateuful workloads seeing this or emptyDir consuming ones - or just in general?

sourcehawk commented 1 year ago

Yeah I figured. The problem was existing PVC's in the cluster that had to be deleted. Thanks again

vanniszsu commented 1 year ago

add root = and state = in /var/lib/rancher/k3s/agent/etc/containerd/config.toml should be able to set custom path for k3s integrated containerd's images and state

predictablemiracle commented 1 year ago

add root = and state = in /var/lib/rancher/k3s/agent/etc/containerd/config.toml should be able to set custom path for k3s integrated containerd's images and state

This won't work as the file is overwritten when the k3s service starts.

ianb-mp commented 10 months ago

Be warned that modifying containerd storage location as suggested by @mdrakiburrahman (and others) can break Kubevirt - see https://github.com/kubevirt/kubevirt/issues/10703#issuecomment-1843863836

EDIT: I also tried using bind mounts rather than symlink, but still had issues

larssb commented 10 months ago

Thank you for pointing that out @ianb-mp. We don't use KubeVirt so we have no issues. Isn't it also possible to configure KubeVirt so that it "knows" where containerd data and conf. is? At least it should be.

mansoncui commented 7 months ago

How was your problem solved?

codeReaper2001 commented 7 months ago

K3s uses containerd in a close to vanilla state.

  • containerd root: /var/lib/rancher/k3s/agent/containerd (all your images, container files etc)
  • containerd state: /run/k3s/containerd (scratch space blown up during containerd reboots)

Practically, what we want is for everything that can grow to be backed by some large drive.

So say, you have a large mounted drive at /mnt, this script below does the trick (the Kubernetes nodefs and imagefs will both be backed by /mnt, and so will kubelet and the local-path persistent volumes) - so your machine's OS directory stays unbloated:

# =======================================
# Storage prep to "/mnt" drive (~500 GB+)
# =======================================
MNT_DIR="/mnt"
K3S_VERSION="v1.25.4+k3s1"

# nodefs
#
KUBELET_DIR="${MNT_DIR}/kubelet"
sudo mkdir -p "${KUBELET_DIR}"

# imagefs: containerd has a root and state directory
#
# - https://github.com/containerd/containerd/blob/main/docs/ops.md#base-configuration
#
# containerd root -> /var/lib/rancher/k3s/agent/containerd
#
CONTAINERD_ROOT_DIR_OLD="/var/lib/rancher/k3s/agent"
CONTAINERD_ROOT_DIR_NEW="${MNT_DIR}/containerd-root/containerd"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_ROOT_DIR_NEW}"
sudo ln -s "${CONTAINERD_ROOT_DIR_NEW}" "${CONTAINERD_ROOT_DIR_OLD}"

# containerd state -> /run/k3s/containerd
#
CONTAINERD_STATE_DIR_OLD="/run/k3s"
CONTAINERD_STATE_DIR_NEW="${MNT_DIR}/containerd-state/containerd"
sudo mkdir -p "${CONTAINERD_STATE_DIR_OLD}"
sudo mkdir -p "${CONTAINERD_STATE_DIR_NEW}"
sudo ln -s "${CONTAINERD_STATE_DIR_NEW}" "${CONTAINERD_STATE_DIR_OLD}"

# pvs -> /var/lib/rancher/k3s/storage
#
PV_DIR_OLD="/var/lib/rancher/k3s"
PV_DIR_NEW="${MNT_DIR}/local-path-provisioner/storage"
sudo mkdir -p "${PV_DIR_OLD}"
sudo mkdir -p "${PV_DIR_NEW}"
sudo ln -s "${PV_DIR_NEW}" "${PV_DIR_OLD}"

# =======
# Install
# =======
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="$K3S_VERSION" INSTALL_K3S_EXEC="--kubelet-arg "root-dir=$KUBELET_DIR"" sh -
sudo chmod 644 /etc/rancher/k3s/k3s.yaml

When kubernetes comes up, you see everything is backed with a lot of space (in my case, /mnt has 600 GB) - so nodefs and imagefs is nice and full:

image

And nodefs and imagefs are at 99% - meaning Eviction Manager will not fire under normal circumstances: image

Hello, I would like to understand how the Eviction Thresholds in the second image were derived. I would like to verify if my configuration is taking effect. Could you please provide more details on this? Thanks a lot!

mdrakiburrahman commented 7 months ago

@codeReaper2001 - full writeup and script here: https://www.rakirahman.me/conquering-eviction-manager-k8s/

codeReaper2001 commented 7 months ago

@codeReaper2001 - full writeup and script here: https://www.rakirahman.me/conquering-eviction-manager-k8s/

Thanks a lot, it worked!