Closed bheisig closed 1 month ago
I could not reproduce your issue, could you maybe provide use the debug logs?
See https://github.com/hetznercloud/csi-driver/blob/main/docs/debug-logs.md
While
$ kubectl -n kube-system logs -f deploy/hcloud-csi-controller hcloud-csi-driver
outputs nothing in debug mode
$ kubectl -n kube-system logs -f daemonsets/hcloud-csi-node hcloud-csi-driver
repeats these lines:
level=debug ts=2024-07-24T14:23:37.575476001Z component=grpc-server msg="handling request" method=/csi.v1.Node/NodeGetCapabilities req=
level=debug ts=2024-07-24T14:23:37.575542186Z component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeGetCapabilities err=null
level=debug ts=2024-07-24T14:23:37.591087021Z component=grpc-server msg="handling request" method=/csi.v1.Node/NodeGetCapabilities req=
level=debug ts=2024-07-24T14:23:37.591156703Z component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeGetCapabilities err=null
level=debug ts=2024-07-24T14:23:37.592175166Z component=grpc-server msg="handling request" method=/csi.v1.Node/NodeGetCapabilities req=
level=debug ts=2024-07-24T14:23:37.592239558Z component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeGetCapabilities err=null
level=debug ts=2024-07-24T14:23:37.593027176Z component=grpc-server msg="handling request" method=/csi.v1.Node/NodePublishVolume req="volume_id:\"101066085\" publish_context:<key:\"devicePath\" value:\"/dev/disk/by-id/scsi-0HC_Volume_101066085\" > target_path:\"/var/lib/kubelet/pods/860d1611-4be6-4fee-a7e2-960f337ddb75/volumes/kubernetes.io~csi/pvc-ef6bff65-56ea-4e6d-9e61-3050355f628b/mount\" volume_capability:<mount:<fs_type:\"xfs\" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:\"encryption-passphrase\" value:\"<redacted>\" > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"false\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"events-5767b68dc4-7nz5c\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"<redacted>\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"860d1611-4be6-4fee-a7e2-960f337ddb75\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"default\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1721830718569-8081-csi.hetzner.cloud\" > "
level=info ts=2024-07-24T14:23:37.664616774Z component=linux-mount-service msg="publishing volume" target-path=/var/lib/kubelet/pods/860d1611-4be6-4fee-a7e2-960f337ddb75/volumes/kubernetes.io~csi/pvc-ef6bff65-56ea-4e6d-9e61-3050355f628b/mount device-path=/dev/mapper/scsi-0HC_Volume_101066085 fs-type=xfs block-volume=false readonly=false mount-options= encrypted=true
E0724 14:23:37.667750 1 mount_linux.go:236] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t xfs /dev/mapper/scsi-0HC_Volume_101066085 /var/lib/kubelet/pods/860d1611-4be6-4fee-a7e2-960f337ddb75/volumes/kubernetes.io~csi/pvc-ef6bff65-56ea-4e6d-9e61-3050355f628b/mount
Output: mount: mounting /dev/mapper/scsi-0HC_Volume_101066085 on /var/lib/kubelet/pods/860d1611-4be6-4fee-a7e2-960f337ddb75/volumes/kubernetes.io~csi/pvc-ef6bff65-56ea-4e6d-9e61-3050355f628b/mount failed: Invalid argument
level=error ts=2024-07-24T14:23:37.667817821Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to publish volume: mount failed: exit status 255\nMounting command: mount\nMounting arguments: -t xfs /dev/mapper/scsi-0HC_Volume_101066085 /var/lib/kubelet/pods/860d1611-4be6-4fee-a7e2-960f337ddb75/volumes/kubernetes.io~csi/pvc-ef6bff65-56ea-4e6d-9e61-3050355f628b/mount\nOutput: mount: mounting /dev/mapper/scsi-0HC_Volume_101066085 on /var/lib/kubelet/pods/860d1611-4be6-4fee-a7e2-960f337ddb75/volumes/kubernetes.io~csi/pvc-ef6bff65-56ea-4e6d-9e61-3050355f628b/mount failed: Invalid argument\n"
We still cannot reproduce the error, we would need you to run a few command to gather some information.
In the hcloud-csi-node-*
pod (from the relevant node) that is trying to mount the xfs filesystem, could you please shell into the hcloud-csi-driver container and run the following command:
Please make sure to update the paths
cat /etc/os-release
cat /etc/mtab
apk add lsblk
lsblk --fs
cryptsetup luksDump /dev/disk/by-id/scsi-0HC_Volume_101069871
cryptsetup -v status scsi-0HC_Volume_101069871
umount /dev/mapper/scsi-0HC_Volume_101069871
mount -v -t xfs /dev/mapper/scsi-0HC_Volume_101069871 /var/lib/kubelet/pods/860d1611-4be6-4fee-a7e2-960f337ddb75/volumes/kubernetes.io~csi/pvc-ef6bff65-56ea-4e6d-9e61-3050355f628b/mount
I tested it again with a fresh deployment:
pvc-5accc317-0126-40fc-bbb4-607de98b1e0a
scsi-0HC_Volume_101070019
pod/events-5767b68dc4-h6rdl
89a645f0-d795-4675-a5b1-7af373588fa5
Result is reproducable:
# kubectl -n kube-system logs -f hcloud-csi-node-s8j8g hcloud-csi-driver
level=debug ts=2024-07-25T09:36:11.386346816Z component=grpc-server msg="handling request" method=/csi.v1.Node/NodeGetCapabilities req=
level=debug ts=2024-07-25T09:36:11.38642836Z component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeGetCapabilities err=null
level=debug ts=2024-07-25T09:36:11.399080253Z component=grpc-server msg="handling request" method=/csi.v1.Node/NodeGetCapabilities req=
level=debug ts=2024-07-25T09:36:11.399165244Z component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeGetCapabilities err=null
level=debug ts=2024-07-25T09:36:11.400243245Z component=grpc-server msg="handling request" method=/csi.v1.Node/NodeGetCapabilities req=
level=debug ts=2024-07-25T09:36:11.400359835Z component=grpc-server msg="finished handling request" method=/csi.v1.Node/NodeGetCapabilities err=null
level=debug ts=2024-07-25T09:36:11.401307591Z component=grpc-server msg="handling request" method=/csi.v1.Node/NodePublishVolume req="volume_id:\"101070019\" publish_context:<key:\"devicePath\" value:\"/dev/disk/by-id/scsi-0HC_Volume_101070019\" > target_path:\"/var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/volumes/kubernetes.io~csi/pvc-5accc317-0126-40fc-bbb4-607de98b1e0a/mount\" volume_capability:<mount:<fs_type:\"xfs\" > access_mode:<mode:SINGLE_NODE_WRITER > > secrets:<key:\"encryption-passphrase\" value:\"<redacted>\" > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"false\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"events-5767b68dc4-h6rdl\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"<redacted>\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"89a645f0-d795-4675-a5b1-7af373588fa5\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"default\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1721899788712-8081-csi.hetzner.cloud\" > "
level=info ts=2024-07-25T09:36:11.521576407Z component=linux-mount-service msg="publishing volume" target-path=/var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/volumes/kubernetes.io~csi/pvc-5accc317-0126-40fc-bbb4-607de98b1e0a/mount device-path=/dev/mapper/scsi-0HC_Volume_101070019 fs-type=xfs block-volume=false readonly=false mount-options= encrypted=true
E0725 09:36:11.586868 1 mount_linux.go:236] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t xfs /dev/mapper/scsi-0HC_Volume_101070019 /var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/volumes/kubernetes.io~csi/pvc-5accc317-0126-40fc-bbb4-607de98b1e0a/mount
Output: mount: mounting /dev/mapper/scsi-0HC_Volume_101070019 on /var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/volumes/kubernetes.io~csi/pvc-5accc317-0126-40fc-bbb4-607de98b1e0a/mount failed: Invalid argument
level=error ts=2024-07-25T09:36:11.586943335Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to publish volume: mount failed: exit status 255\nMounting command: mount\nMounting arguments: -t xfs /dev/mapper/scsi-0HC_Volume_101070019 /var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/volumes/kubernetes.io~csi/pvc-5accc317-0126-40fc-bbb4-607de98b1e0a/mount\nOutput: mount: mounting /dev/mapper/scsi-0HC_Volume_101070019 on /var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/volumes/kubernetes.io~csi/pvc-5accc317-0126-40fc-bbb4-607de98b1e0a/mount failed: Invalid argument\n"
As you asked, I opened a shell:
$ kubectl -n kube-system exec -it hcloud-csi-node-s8j8g -c hcloud-csi-driver -- ash
# cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.20.1
PRETTY_NAME="Alpine Linux v3.20"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
Neither pod nor PVC are mentioned in /etc/mtab
. I removed all non-related pods and PVCs from output:
# cat /etc/mtab
overlay / overlay rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/5819/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/5818/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/5817/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/573/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/7772/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/7772/work 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs ro,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,nosuid,relatime,size=16024084k,nr_inodes=4006021,mode=755,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
/dev/sda1 /run/csi ext4 rw,relatime 0 0
/dev/sda1 /etc/hosts ext4 rw,relatime 0 0
/dev/sda1 /dev/termination-log ext4 rw,relatime 0 0
/dev/sda1 /etc/hostname ext4 rw,relatime 0 0
/dev/sda1 /etc/resolv.conf ext4 rw,relatime 0 0
shm /dev/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=65536k,inode64 0 0
/dev/sda1 /var/lib/kubelet ext4 rw,relatime 0 0
tmpfs /run/secrets/kubernetes.io/serviceaccount tmpfs ro,relatime,size=163840k,inode64 0 0
I couldn't do these steps because cluster is behind a proxy:
apk add lsblk
lsblk --fs
# cryptsetup luksDump /dev/disk/by-id/scsi-0HC_Volume_101070019
LUKS header information for /dev/disk/by-id/scsi-0HC_Volume_101070019
Version: 1
Cipher name: aes
Cipher mode: xts-plain64
Hash spec: sha256
Payload offset: 4096
MK bits: 512
MK digest: 5e c6 d0 61 1a 25 f1 c3 09 f3 44 57 34 81 07 bb f6 29 77 0c
MK salt: 8e ff 99 c9 4b 90 f3 bf b2 14 8f ee 7e 70 25 58
7d 42 b7 64 1d c7 8a e4 1f 4b 82 86 12 f1 37 b2
MK iterations: 85892
UUID: f03e2070-e239-40aa-bc3f-1e7d31cc5c83
Key Slot 0: ENABLED
Iterations: 1466538
Salt: 34 a5 69 1a 38 d8 4c 18 74 14 39 0a d1 57 72 51
28 08 d7 2f c5 38 d5 84 05 6d 72 0c 11 f9 fa 2c
Key material offset: 8
AF stripes: 4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED
# cryptsetup -v status scsi-0HC_Volume_101070019
/dev/mapper/scsi-0HC_Volume_101070019 is active.
type: LUKS1
cipher: aes-xts-plain64
keysize: 512 bits
key location: dm-crypt
device: /dev/sdo
sector size: 512
offset: 4096 sectors
size: 20967424 sectors
mode: read/write
flags: discards
Command successful.
# umount /dev/mapper/scsi-0HC_Volume_101070019
umount: can't unmount /dev/mapper/scsi-0HC_Volume_101070019: Invalid argument
Finally, I tried to mount it manually:
# mount -v -t xfs /dev/mapper/scsi-0HC_Volume_101070019 /var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/mount
mount: mounting /dev/mapper/scsi-0HC_Volume_101070019 on /var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/mount failed: No such file or directory
# mount -v -t xfs /dev/mapper/scsi-0HC_Volume_101070019 /var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/volumes/kubernetes.io\~csi/pvc-5accc317-0126-40fc-bbb4-607de98b1e0a/mount
mount: mounting /dev/mapper/scsi-0HC_Volume_101070019 on /var/lib/kubelet/pods/89a645f0-d795-4675-a5b1-7af373588fa5/volumes/kubernetes.io~csi/pvc-5accc317-0126-40fc-bbb4-607de98b1e0a/mount failed: No such file or directory
# mount -v -t xfs /dev/mapper/scsi-0HC_Volume_101070019 /tmp/
mount: mounting /dev/mapper/scsi-0HC_Volume_101070019 on /tmp/ failed: Invalid argument
Sadly the mount command does not give us enough details, could you please share us the kernel logs after a mount failure? Same as above, after shelling in the hcloud-csi-node container, you could run:
dmesg
The kernel logs from the node might be useful, if the logs from within the hcloud-csi-node container are not helpful enough.
These lines are repeated dozens of times in dmesg
:
[627973.852544] XFS (dm-12): Superblock has unknown incompatible features (0x20) enabled.
[627973.852549] XFS (dm-12): Filesystem cannot be safely mounted by this kernel.
This seems related to https://gitlab.alpinelinux.org/alpine/aports/-/issues/15744, could you maybe try the suggestions from this ticket, to see if it solves the problem for you?
Also, if you could provide us information about the xfs partition, e.g. using xfs_info
:
/ # xfs_info /dev/mapper/scsi-0HC_Volume_101083717
meta-data=/dev/mapper/scsi-0HC_Volume_101083717 isize=512 agcount=4, agsize=655232 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=0 inobtcount=0
data = bsize=4096 blocks=2620928, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Maybe running xfs_repair
in dry-run mode could also provide us some information:
/ # xfs_repair -n /dev/mapper/scsi-0HC_Volume_101083717
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.
Sorry for all this back and forth, I sadly still cannot reproduce this. Thanks a lot for continuing to provide us debugging information!
I suspect that the node's kernel must be updated to fix this, either by upgrading the kernel itself or by upgrade the worker to e.g. ubuntu-24.04.
On our dev/test nodes, we currently have: Linux hcloud-csi-node-hjzfd 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 Linux
After a recent upgrade we use the same Kernel version on our worker nodes (still on Ubuntu 22.04):
# uname -a
Linux hcloud-csi-node-hk8cf 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 Linux
I've tried to format the filesystem manually but failed every time:
# mkfs.xfs -d size=1g -n ftype=1 -i rnext64=0 /dev/mapper/scsi-0HC_Volume_101070019
unknown option -i (null)
# mkfs.xfs -d size=1g -n ftype=1 -m rnext64=0 /dev/mapper/scsi-0HC_Volume_101070019
unknown option -m (null)
A bit unexpected: xfs_info
returned some info:
# xfs_info /dev/mapper/scsi-0HC_Volume_101070019
meta-data=/dev/mapper/scsi-0HC_Volume_101070019 isize=512 agcount=4, agsize=655232 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=1 inobtcount=1
data = bsize=4096 blocks=2620928, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=25650, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
But xfs_repair
failed:
# xfs_repair -n /dev/mapper/scsi-0HC_Volume_101070019
Phase 1 - find and verify superblock...
Superblock has unknown compat/rocompat/incompat features (0x0/0x0/0x20).
Using a more recent xfs_repair is recommended.
Found unsupported filesystem features. Exiting now.
Sorry for all this back and forth, I sadly still cannot reproduce this. Thanks a lot for continuing to provide us debugging information!
No worries. Keep up with your good work :)
Did you run the commands above using the new csi-driver containers? If not, could you try that?
Between csi-driver 2.7.1 and 2.8.0, the alpine image was bumped from alpine 3.15 to alpine 3.20 https://github.com/hetznercloud/csi-driver/commit/1e18db30dcc4395847208ebcaba8232c9aba0cc2
I noticed some differences between our 2 filesystem (note that the xfs_info was executed with csi-driver 2.7.1): < ours < | > yours >
Did you by any chance format the volume using custom flags?
The same command using the csi-driver 2.8.0:
/ # xfs_info /dev/mapper/scsi-0HC_Volume_101083717
meta-data=/dev/mapper/scsi-0HC_Volume_101083717 isize=512 agcount=4, agsize=655232 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=0 inobtcount=0 nrext64=0
data = bsize=4096 blocks=2620928, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
The mkfs.xfs command might have failed because the flags were probably not available in csi-driver 2.7.1 (alpine 3.15).
Container definitely runs on v2.8.0:
$ kubectl -n kube-system describe pod hcloud-csi-node-45fzw
Name: hcloud-csi-node-45fzw
Namespace: kube-system
Priority: 0
Service Account: default
Node: k8s-worker-3a771cd32e16a6cf/10.96.1.11
Start Time: Tue, 30 Jul 2024 11:15:53 +0000
Labels: app.kubernetes.io/component=node
app.kubernetes.io/instance=hcloud-csi
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=hcloud-csi
controller-revision-hash=796c69c9cd
helm.sh/chart=hcloud-csi-2.8.0
pod-template-generation=14
Annotations: <none>
Status: Running
IP: 10.98.5.57
IPs:
IP: 10.98.5.57
Controlled By: DaemonSet/hcloud-csi-node
Containers:
csi-node-driver-registrar:
Container ID: containerd://148dbe7b15fc540cb1d9094737c6aa879f76211a7875046e82c9b0a3f0983a2f
Image: registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
Image ID: registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:4a4cae5118c4404e35d66059346b7fa0835d7e6319ff45ed73f4bba335cf5183
Port: <none>
Host Port: <none>
Args:
--kubelet-registration-path=/var/lib/kubelet/plugins/csi.hetzner.cloud/socket
State: Running
Started: Tue, 30 Jul 2024 11:15:54 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 40Mi
Requests:
cpu: 10m
memory: 20Mi
Environment: <none>
Mounts:
/registration from registration-dir (rw)
/run/csi from plugin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4fj5b (ro)
liveness-probe:
Container ID: containerd://0b4e880ef5821fccc5bcbfc27de10501788693d913e6d1b4a6b94573c9627050
Image: registry.k8s.io/sig-storage/livenessprobe:v2.9.0
Image ID: registry.k8s.io/sig-storage/livenessprobe@sha256:2b10b24dafdc3ba94a03fc94d9df9941ca9d6a9207b927f5dfd21d59fbe05ba0
Port: <none>
Host Port: <none>
State: Running
Started: Tue, 30 Jul 2024 11:15:54 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 50m
memory: 40Mi
Requests:
cpu: 10m
memory: 20Mi
Environment: <none>
Mounts:
/run/csi from plugin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4fj5b (ro)
hcloud-csi-driver:
Container ID: containerd://1839db61d35613e4c9793aeef23f1841824de8083b4ce39103ad8905c17d1b72
Image: docker.io/hetznercloud/hcloud-csi-driver:v2.8.0
Image ID: docker.io/hetznercloud/hcloud-csi-driver@sha256:ca7745faf7ef9c478204382ef98cb1339176a64592331de1668c297a920564c7
Ports: 9189/TCP, 9808/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/bin/hcloud-csi-driver-node
State: Running
Started: Tue, 30 Jul 2024 11:15:54 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 80Mi
Requests:
cpu: 10m
memory: 40Mi
Liveness: http-get http://:healthz/healthz delay=10s timeout=3s period=2s #success=1 #failure=5
Environment:
CSI_ENDPOINT: unix:///run/csi/socket
METRICS_ENDPOINT: 0.0.0.0:9189
ENABLE_METRICS: true
LOG_LEVEL: debug
HCLOUD_DEBUG: true
Mounts:
/dev from device-dir (rw)
/run/csi from plugin-dir (rw)
/var/lib/kubelet from kubelet-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4fj5b (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kubelet-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet
HostPathType: Directory
plugin-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/plugins/csi.hetzner.cloud/
HostPathType: DirectoryOrCreate
registration-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/plugins_registry/
HostPathType: Directory
device-dir:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType: Directory
kube-api-access-4fj5b:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute op=Exists
:NoSchedule op=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 46m default-scheduler Successfully assigned kube-system/hcloud-csi-node-45fzw to k8s-worker-3a771cd32e16a6cf
Normal Pulled 46m kubelet Container image "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0" already present on machine
Normal Created 46m kubelet Created container csi-node-driver-registrar
Normal Started 46m kubelet Started container csi-node-driver-registrar
Normal Pulled 46m kubelet Container image "registry.k8s.io/sig-storage/livenessprobe:v2.9.0" already present on machine
Normal Created 46m kubelet Created container liveness-probe
Normal Started 46m kubelet Started container liveness-probe
Normal Pulled 46m kubelet Container image "docker.io/hetznercloud/hcloud-csi-driver:v2.8.0" already present on machine
Normal Created 46m kubelet Created container hcloud-csi-driver
Normal Started 46m kubelet Started container hcloud-csi-driver
# xfs_info /dev/mapper/scsi-0HC_Volume_101070019
meta-data=/dev/mapper/scsi-0HC_Volume_101070019 isize=512 agcount=4, agsize=655232 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=1 inobtcount=1 nrext64=1
data = bsize=4096 blocks=2620928, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=25650, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Custom flags are not used.
Hey,
i am a workmate from @bheisig . We had done a full cleanup of all affected pvcs and redeployed the csi v2.8.0. Now i tried a deployment with encrypted xfs and got the same behavior like above.
Than i formatted the volume directly via the hcloud-csi-driver container.
kubectl exec -it -n kube-system hcloud-csi-node-v25r4 -c hcloud-csi-driver ash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # mkfs.xfs -d size=1g -n ftype=1 -i nrext64=0 /dev/mapper/scsi-0HC_Volume_101086364 -f
meta-data=/dev/mapper/scsi-0HC_Volume_101086364 isize=512 agcount=4, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=1 inobtcount=1 nrext64=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=22482, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Discarding blocks...Done.
/ #
Now it was possible to mount the volume fully functional.
Logs broken:
level=error ts=2024-07-30T12:54:07.312859845Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to publish volume: mount failed: exit status 255\nMounting command: mount\nMounting arguments: -t xfs /dev/mapper/scsi-0HC_Volume_101086364 /var/lib/kubelet/pods/494bd080-4f4a-4efb-a237-10e68a972799/volumes/kubernetes.io~csi/pvc-dc2ab109-b9a2-4e35-ad60-77df1914f811/mount\nOutput: mount: mounting /dev/mapper/scsi-0HC_Volume_101086364 on /var/lib/kubelet/pods/494bd080-4f4a-4efb-a237-10e68a972799/volumes/kubernetes.io~csi/pvc-dc2ab109-b9a2-4e35-ad60-77df1914f811/mount failed: Invalid argument\n"
level=info ts=2024-07-30T12:55:11.411331077Z component=linux-mount-service msg="publishing volume" target-path=/var/lib/kubelet/pods/494bd080-4f4a-4efb-a237-10e68a972799/volumes/kubernetes.io~csi/pvc-dc2ab109-b9a2-4e35-ad60-77df1914f811/mount device-path=/dev/mapper/scsi-0HC_Volume_101086364 fs-type=xfs block-volume=false readonly=false mount-options= encrypted=true
level=error ts=2024-07-30T12:55:11.414147489Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to publish volume: mount failed: exit status 255\nMounting command: mount\nMounting arguments: -t xfs /dev/mapper/scsi-0HC_Volume_101086364 /var/lib/kubelet/pods/494bd080-4f4a-4efb-a237-10e68a972799/volumes/kubernetes.io~csi/pvc-dc2ab109-b9a2-4e35-ad60-77df1914f811/mount\nOutput: mount: mounting /dev/mapper/scsi-0HC_Volume_101086364 on /var/lib/kubelet/pods/494bd080-4f4a-4efb-a237-10e68a972799/volumes/kubernetes.io~csi/pvc-dc2ab109-b9a2-4e35-ad60-77df1914f811/mount failed: Invalid argument\n"
E0730 12:55:11.414095 1 mount_linux.go:236] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t xfs /dev/mapper/scsi-0HC_Volume_101086364 /var/lib/kubelet/pods/494bd080-4f4a-4efb-a237-10e68a972799/volumes/kubernetes.io~csi/pvc-dc2ab109-b9a2-4e35-ad60-77df1914f811/mount
Output: mount: mounting /dev/mapper/scsi-0HC_Volume_101086364 on /var/lib/kubelet/pods/494bd080-4f4a-4efb-a237-10e68a972799/volumes/kubernetes.io~csi/pvc-dc2ab109-b9a2-4e35-ad60-77df1914f811/mount failed: Invalid argument
fixed:
level=info ts=2024-07-30T12:57:13.496861452Z component=linux-mount-service msg="publishing volume" target-path=/var/lib/kubelet/pods/494bd080-4f4a-4efb-a237-10e68a972799/volumes/kubernetes.io~csi/pvc-dc2ab109-b9a2-4e35-ad60-77df1914f811/mount device-path=/dev/mapper/scsi-0HC_Volume_101086364 fs-type=xfs block-volume=false readonly=false mount-options= encrypted=true
We finally managed to reproduce the bug. Setting nrext64=0
does indeed solve the problem.
The mismatch between the supported XFS options between the csi-driver container (alpine) and the node kernel is at the source of this bug. While the csi-driver formats the file system with the nrext64=1
flag, the mount
fails because the kernel does not understand some options.
We will release a minimal bug fix that forces the use of nrext64=0
.
We are also considering to only support a minimal kernel version (e.g. >=5.15) by default, using for example mkfs.xfs -f -c options=/usr/share/xfsprogs/mkfs/lts_5.15.conf /dev/mapper/scsi-0HC_Volume_101135663
$ cat /usr/share/xfsprogs/mkfs/lts_5.15.conf
# V5 features that were the mkfs defaults when the upstream Linux 5.15 LTS
# kernel was released at the end of 2021.
[metadata]
bigtime=1
crc=1
finobt=1
inobtcount=1
reflink=1
rmapbt=0
[inode]
sparse=1
nrext64=0
TL;DR
Since version 2.8.0 k8s is unable to mount encrypted PVC with file system type XFS.
Expected behavior
hcloud volume will be attached to server, encrypted and mounted with file system type XFS
Observed behavior
kubectl -n <redacted> get events -w
:Minimal working example
Log output
Journal on k8s worker node:
kubectl -n kube-system logs -f -l app.kubernetes.io/instance=hcloud-csi
during this time: nothingAdditional information
kubectl version
:k8s worker node:
lsb_release -a
:uname -a
: