Open remoe opened 4 years ago
Perhaps, I'm not opposed to local volumes. But I find they offer limited value beyond hostPath.
I'd be remiss to not begin by saying that we all (know we) ought to avoid storing data on specific Kubernetes nodes (regardless of mechanism). Nevertheless, there are plenty of cases where node storage (by which I'll refer to both) is an unfortunate neccessity / temporary tradeoff while we aspire for better / etc.[1]
In an example situation, you might:
In both cases, the result is the same - a Pod requiring a specific mount on a specific host. Some would say the first Deployment looks uglier. I'd say it more clearly exposes the real situation, while the local volume tends to mask it (e.g. debug why the pod isn't scheduling by inferring which of its volumes expresses its own selector logic). Some of the local volume motives were around making hostPath "feel" like it is any other volume. Another factor is that local volumes require more moving parts (e.g. it must pass through the scheduler, whereas hostPath is decoupled and works with static pods too). I think local volumes might show their merit if you could entirely eliminate and forbid hostPath, as it might limit host access, but that's rather unlikely. I prefer hostPath, but I'm not opposed to local volumes as more-or-less the same.
There are a few local volume matters that require consideration.
For the mounts, Kubelet should not mount /mnt
. /mnt
is quite a common location for data volumes and its unexpected that mounting there would expose your data to the Kubelet (especially when Kubelet can modify it, below). A better approach would be to carve out mount subdirectories where an admin might mount disks or other storage components that should be exposed as local volumes.
A possible option might be:
/mnt
└── kubernetes-local
Prefixed with Kubernetes for clear opt-in and to allow for future node volume types.
With Fedora CoreOS in the mix, SELinux is increasingly a first-class citizen and concern. I don't think its appropriate for Kubelet to relabel mounts of an end user's data, so podman Kubelets should not use relabel options. I suspect this will require some guidance, as users will need to prepare local volumes that align with the SELinux labels of the existing node-local data. I've not tested the various pitfalls around this area.
Finally, I'd scope this to bare-metal only. Workers on cloud platforms are homogeneous and really ought to be treated as entirely fungible. I'm not keen to provide additional features for workloads to rely on node storage (by neccessity you do have hostPath). I think bare-metal has a more legitmate claim. There, nodes may be very unique (fancy storage arrays of various kinds on particular nodes) and it can be reasonable to think an admin would invest effort into repairing a faulty storage component (i.e. use of hostPath and local volume is more justified when machines are pets that get groomed and loved).
[1]: Some node storage cases are entirely appropriate and justified (e.g. control plane DaemonSets). For these, hostPath is used.
sample update for fcos (tested with latest (1.18.2) fcos-typhoon):
# ...
- name: kubelet.service
# ...
ExecStartPre=/bin/mkdir -p /var/mnt/kubernetes-local/drive0
ExecStart=/usr/bin/podman run --name kubelet \
# ...
--volume /var/mnt/kubernetes-local/drive0:/var/mnt/kubernetes-local/drive0:z \
fcos ignition sample:
variant: fcos
version: 1.0.0
storage:
filesystems:
- path: /var/mnt/kubernetes-local/drive0
device: /dev/vdb
format: ext4
systemd:
units:
- name: var-mnt-kubernetes\x2dlocal-drive0.mount
enabled: true
contents: |
[Unit]
Description=Mount peristent to /var/mnt/kubernetes-local/drive0
Before=local-fs.target
[Mount]
Where=/var/mnt/kubernetes-local/drive0
What=/dev/vdb
Type=ext4
[Install]
WantedBy=local-fs.target
and the corresponding PV:
spec:
storageClassName: local-storage
local:
path: /var/mnt/kubernetes-local/drive0
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: "kubernetes.io/hostname"
operator: "In"
values:
- "your.host.name"
I'd like to +1 the bare-metal local volume case, and expand the target use case beyond regular "bare metal" considerations to include software-defined storage composed of bare-metal block devices (e.g. Rook). I am building a NUC cluster running Typhoon right now and will be extending the Typhoon modules to support my specific use case - that is, NVME boot drives with SATA block devices, so a slight modification of the above example.
I think some more use cases and input would be healthy to build a better solution, though. :)
Since Kubernetes v1.14 it's possible to use "local" volumes:
https://kubernetes.io/docs/concepts/storage/volumes/#local
This is currently not possible in typhoon, because of this:
https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/docs/faqs.md#volume-does-not-exist-with-containerized-kubelet
It would work when one add the following (as an example) lines here:
https://github.com/poseidon/typhoon/blob/master/bare-metal/container-linux/kubernetes/cl/worker.yaml#L78
And then it would be possible to mount disks at "/mnt/*"
Thoughts?