poseidon / typhoon

Minimal and free Kubernetes distribution with Terraform
https://typhoon.psdn.io/
MIT License
1.95k stars 323 forks source link

Support to use "local" volumes? #609

Open remoe opened 4 years ago

remoe commented 4 years ago

Since Kubernetes v1.14 it's possible to use "local" volumes:

https://kubernetes.io/docs/concepts/storage/volumes/#local

This is currently not possible in typhoon, because of this:

https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/blob/master/docs/faqs.md#volume-does-not-exist-with-containerized-kubelet

It would work when one add the following (as an example) lines here:

https://github.com/poseidon/typhoon/blob/master/bare-metal/container-linux/kubernetes/cl/worker.yaml#L78

          --volume mntdisks,kind=host,source=/mnt \
          --mount volume=mntdisks,target=/mnt \

And then it would be possible to mount disks at "/mnt/*"

Thoughts?

dghubble commented 4 years ago

Perhaps, I'm not opposed to local volumes. But I find they offer limited value beyond hostPath.

I'd be remiss to not begin by saying that we all (know we) ought to avoid storing data on specific Kubernetes nodes (regardless of mechanism). Nevertheless, there are plenty of cases where node storage (by which I'll refer to both) is an unfortunate neccessity / temporary tradeoff while we aspire for better / etc.[1]

In an example situation, you might:

In both cases, the result is the same - a Pod requiring a specific mount on a specific host. Some would say the first Deployment looks uglier. I'd say it more clearly exposes the real situation, while the local volume tends to mask it (e.g. debug why the pod isn't scheduling by inferring which of its volumes expresses its own selector logic). Some of the local volume motives were around making hostPath "feel" like it is any other volume. Another factor is that local volumes require more moving parts (e.g. it must pass through the scheduler, whereas hostPath is decoupled and works with static pods too). I think local volumes might show their merit if you could entirely eliminate and forbid hostPath, as it might limit host access, but that's rather unlikely. I prefer hostPath, but I'm not opposed to local volumes as more-or-less the same.

There are a few local volume matters that require consideration.

Mounts

For the mounts, Kubelet should not mount /mnt. /mnt is quite a common location for data volumes and its unexpected that mounting there would expose your data to the Kubelet (especially when Kubelet can modify it, below). A better approach would be to carve out mount subdirectories where an admin might mount disks or other storage components that should be exposed as local volumes.

A possible option might be:

/mnt
└── kubernetes-local

Prefixed with Kubernetes for clear opt-in and to allow for future node volume types.

SELinux

With Fedora CoreOS in the mix, SELinux is increasingly a first-class citizen and concern. I don't think its appropriate for Kubelet to relabel mounts of an end user's data, so podman Kubelets should not use relabel options. I suspect this will require some guidance, as users will need to prepare local volumes that align with the SELinux labels of the existing node-local data. I've not tested the various pitfalls around this area.

Bare-Metal Only

Finally, I'd scope this to bare-metal only. Workers on cloud platforms are homogeneous and really ought to be treated as entirely fungible. I'm not keen to provide additional features for workloads to rely on node storage (by neccessity you do have hostPath). I think bare-metal has a more legitmate claim. There, nodes may be very unique (fancy storage arrays of various kinds on particular nodes) and it can be reasonable to think an admin would invest effort into repairing a faulty storage component (i.e. use of hostPath and local volume is more justified when machines are pets that get groomed and loved).

[1]: Some node storage cases are entirely appropriate and justified (e.g. control plane DaemonSets). For these, hostPath is used.

remoe commented 4 years ago

sample update for fcos (tested with latest (1.18.2) fcos-typhoon):

    # ...
    - name: kubelet.service
    # ...
        ExecStartPre=/bin/mkdir -p /var/mnt/kubernetes-local/drive0
        ExecStart=/usr/bin/podman run --name kubelet \ 
        # ...
          --volume /var/mnt/kubernetes-local/drive0:/var/mnt/kubernetes-local/drive0:z \

fcos ignition sample:

variant: fcos
version: 1.0.0
storage:
  filesystems:
    - path: /var/mnt/kubernetes-local/drive0
      device: /dev/vdb
      format: ext4
systemd:
  units:
    - name: var-mnt-kubernetes\x2dlocal-drive0.mount
      enabled: true
      contents: |
        [Unit]
        Description=Mount peristent to /var/mnt/kubernetes-local/drive0
        Before=local-fs.target
        [Mount]
        Where=/var/mnt/kubernetes-local/drive0
        What=/dev/vdb
        Type=ext4
        [Install]
        WantedBy=local-fs.target 

and the corresponding PV:

spec:
  storageClassName: local-storage 
  local:
    path: /var/mnt/kubernetes-local/drive0
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: "kubernetes.io/hostname"
          operator: "In"
          values: 
            - "your.host.name"    
jharmison-redhat commented 3 years ago

I'd like to +1 the bare-metal local volume case, and expand the target use case beyond regular "bare metal" considerations to include software-defined storage composed of bare-metal block devices (e.g. Rook). I am building a NUC cluster running Typhoon right now and will be extending the Typhoon modules to support my specific use case - that is, NVME boot drives with SATA block devices, so a slight modification of the above example.

I think some more use cases and input would be healthy to build a better solution, though. :)