siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.47k stars 517 forks source link

Structured `/var` #8016

Open smira opened 9 months ago

smira commented 9 months ago
### Tasks
- [ ] #1542

Problem Statement

At the moment /var (also known as EPHEMERAL partition) doesn’t have any specific structure: users are allowed to create mount points, put user files at random locations under /var.

For the pod hostPath mount to work properly with all the features supported by the kubelet, mount path should be available in the kubelet mount namespace same way as in the host namespace. This requires manual and non-obvious configuration.

For the external mounts (e.g. NFS) to work properly if mounting is done from the kubelet, the mount path should be done in the kubelet.

Talos doesn’t offer a way to put user files really ephemeral (i.e. using tmpfs), so that reboot is enough to clean up the state.

Talos doesn’t support full reconciliation for machine.files key, as contents of the /var are not known, and the effect of removing a value from machine.files is not clear.

There’s no way to remove parts of the /var (e.g. if some directory was created by mistake).

Some critical or system-important parts of /var are not protected from simple mistakes (e.g. creating a wrong hostPath mount under the etcd data directory).

What’s in /var?

Proposal

etcd

Make sure etcd data directory is only accessible by etcd itself (and, Talos itself for the purposes of backup/restore). No other workload should be able to access the etcd data ever.

E.g. we could use SELinux, which will protect etcd from other workloads while it can also protect workloads from accessing etcd.

kubelet

Mostly same thing as etcd, we should look into protecting data directory from other workloads. As kubelet makes a lot of random access, it’s hard to contain kubelet itself from accessing other directories.

Logs

We can look into making sure other workloads have read-only access to the logs, while kubelet (?) can write the logs.

run

Should we make this tmpfs (if not already?)

containerd

Not much we can offer, as workloads write to the container scratch space.

overlays

This is a Talos-specific location, and we shouldn’t allow random writes there (overlayfs upperdir, workdir). We should look into minimizing the overlays on /var (we could replace with overlays on tmpfs when it makes sense).

/var/mnt

Introduce new directory (naming TBD) which serves a root mount point for:

This path is mounted as rshared into the kubelet container, so that mounts both ways (from the kubelet to the host, from the host to the kubelet) are visible.

Users are supposed only to use this hostPath for such mounts.

Questions:

machine.files

We need to split it into the usecases for this feature:

In general, machine.files should work on top of the controller.

API to Remove File(s)

Should be restricted to work under /var/mnt only.

Benefits

runningman84 commented 6 months ago

Protecting stuff is one thing another problem right now is that they all share the same filesystem. That means if we use the local path provisioner and consume all the space, etcd will crash due to out of disk space errors. In my old k3s setup I used to have lvm volumes for eauch of the consumers like etcd, longhorn, local-path and so on....