Closed mhutter closed 1 year ago
Digging into this it turns out that while patching the DaemonSet allows the pods to come up, it's not the correct thing to do.
The reason behind is that K0s starts the kubelet with non-standard directories:
/var/lib/k0s/bin/kubelet \
--cert-dir=/var/lib/k0s/kubelet/pki \
--container-runtime-endpoint=unix:///run/k0s/containerd.sock \
--config=/var/lib/k0s/kubelet-config.yaml \
--kubeconfig=/var/lib/k0s/kubelet.conf \
--v=1 \
--containerd=/run/k0s/containerd.sock \
--node-ip=10.42.0.2 \
--runtime-cgroups=/system.slice/containerd.service \
--root-dir=/var/lib/k0s/kubelet \
--bootstrap-kubeconfig=/var/lib/k0s/kubelet-bootstrap.conf
So the actual fix for K0s would be to change all mounts from /var/lib/kubelet
to /var/lib/k0s/kubelet
.
However I have no clue how this could be detected....
My current workaround is to apply the following patch via kustomization:
resources:
- ./token.json # SealedSecret with the hcloud API token
- https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.1.0/deploy/kubernetes/hcloud-csi.yml
patchesStrategicMerge:
- |-
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: hcloud-csi-node
namespace: kube-system
spec:
template:
spec:
volumes:
- hostPath:
path: /var/lib/k0s/kubelet
name: kubelet-dir
- hostPath:
path: /var/lib/k0s/kubelet/plugins/csi.hetzner.cloud/
name: plugin-dir
- hostPath:
path: /var/lib/k0s/kubelet/plugins_registry/
name: registration-dir
AFAICT we can not just change it to DirectoryOrCreate
, because the kubelet is activly monitoring to directory, and for k0s it is monitoring the different path. So even though the plugin would startup with DirectoryOrCreate
, it would never get registered with the kubelet and you could not mount volumes.
My current workaround is to apply the following patch via kustomization:
resources: - ./token.json # SealedSecret with the hcloud API token - https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.1.0/deploy/kubernetes/hcloud-csi.yml patchesStrategicMerge: - |- apiVersion: apps/v1 kind: DaemonSet metadata: name: hcloud-csi-node namespace: kube-system spec: template: spec: volumes: - hostPath: path: /var/lib/k0s/kubelet name: kubelet-dir - hostPath: path: /var/lib/k0s/kubelet/plugins/csi.hetzner.cloud/ name: plugin-dir - hostPath: path: /var/lib/k0s/kubelet/plugins_registry/ name: registration-dir
I think this is a great solution. Maybe we should publish a Helm Chart to make configuring such things easier.
Maybe we should publish a Helm Chart to make configuring such things easier
While researching the issue I found out that this seems to be the way some CSI providers go.
I have also opened https://github.com/k0sproject/k0s/issues/2599 to at least get some documentation on what else there is special about K0s setups...
So apparently it's a bit more involved to get this even running.
I had to adjust the patch to this to even get the csi-node pods running:
patchesStrategicMerge:
- |-
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: hcloud-csi-node
namespace: kube-system
spec:
template:
spec:
containers:
- name: csi-node-driver-registrar
args:
- --kubelet-registration-path=/var/lib/k0s/kubelet/plugins/csi.hetzner.cloud/socket
volumes:
- name: kubelet-dir
hostPath:
path: /var/lib/k0s/kubelet
- name: plugin-dir
hostPath:
path: /var/lib/k0s/kubelet/plugins/csi.hetzner.cloud/
- name: registration-dir
hostPath:
path: /var/lib/k0s/kubelet/plugins_registry/
However, it still does not work.
Given the following test manifest:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: csi-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: hcloud-volumes
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: busybox
image: docker.io/library/busybox
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/data"
name: my-csi-volume
command: [ "sleep", "1000000" ]
volumes:
- name: my-csi-volume
persistentVolumeClaim:
claimName: csi-pvc
The PVC gets properly bound, the Pod comes up. The volume is attached to the correct server, and the CSI driver reports successful mounting of the volume:
level=info ts=2023-01-16T15:38:34.234862716Z component=linux-mount-service msg="formatting disk" disk=/dev/disk/by-id/scsi-0HC_Volume_26750481 fstype=ext4 level=info ts=2023-01-16T15:38:34.792342371Z component=linux-mount-service msg="publishing volume" target-path=/var/lib/k0s/kubelet/pods/1427828c-0c03-419a-bdd2-8cd1d31c86af/volumes/kubernetes.io~csi/pvc-01560db4-164f-4e1d-b24d-361890a5ff84/mount device-path=/dev/disk/by-id/scsi-0HC_Volume_26750481 fs-type=ext4 block-volume=false readonly=false mount-options= encrypted=false
Even the syslog mentions that the thing was mounted:
[root@worker-i1ht ~]# journalctl -xe | grep sdb
Jan 16 15:38:29 worker-i1ht kernel: sd 0:0:0:1: [sdb] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
Jan 16 15:38:29 worker-i1ht kernel: sd 0:0:0:1: [sdb] Write Protect is off
Jan 16 15:38:29 worker-i1ht kernel: sd 0:0:0:1: [sdb] Mode Sense: 63 00 00 08
Jan 16 15:38:29 worker-i1ht kernel: sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 16 15:38:29 worker-i1ht kernel: sd 0:0:0:1: [sdb] Attached SCSI disk
Jan 16 15:38:34 worker-i1ht kernel: EXT4-fs (sdb): mounted filesystem with ordered data mode. Quota mode: none.
but.... it is not:
[root@worker-i1ht ~]# mount | grep ^/
/dev/sda1 on / type ext4 (rw,relatime,seclabel)
/dev/sda14 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)
Maybe related: #343
When exec
ing into the pod, df
reports that the servers root disk is mounted at /data
. Writing to /data
works, but will prevent the pod from being terminated properly: The cleanup process tries to remove /var/lib/k0s/kubelet/pods/1427828c-0c03-419a-bdd2-8cd1d31c86af/volumes/kubernetes.io~csi/pvc-01560db4-164f-4e1d-b24d-361890a5ff84/mount
but fails because it is not empty (it contains the written data).
I'm a bit at loss about how to troubleshoot this further, or what needs to be fixed to get this working with K0s
Turns out I messed up some mount paths & containers. Now that all is fixed, it works! I added the required patches to the issue description.
Opened #369 for the helm chart.
Thanks @mhutter for providing the required patches for current users of k0s!
This was already reported in #260 but was never fixed.
Should I prepare a PR for this?
Steps to reproduce
kubectl apply -f https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.1.0/deploy/kubernetes/hcloud-csi.yml
)Expected outcome
The driver starts up
Actual outcome
All
hcloud-csi-node
pods stuck inContainerCreating
with the following event:Fix
In thehcloud-csi-node
DaemonSet, changehostPath.type
of theregistration-dir
volume toDirectoryOrCreate
.