Closed fr6nco closed 5 days ago
that disk could be partially broken, pls take a snapshot and then use fsck -y /dev/sdx
to fix the original disk.
I dont think its a broken disk. On a node reboot that same PVC mounts fine, also when the pod is evicted to a different node, the PVC is mounted fine. It more looks like a parsing error to me (or unexpected output from blkid) as seen in:
I0430 14:08:20.160870 1 mount_linux.go:579] Attempting to determine if disk "/dev/disk/azure/scsi1/lun1" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/disk/azure/scsi1/lun1])
I0430 14:08:20.248512 1 mount_linux.go:582] Output: ""
Ill try to tackle down some more details as after reboot the PVCs are mounted fine. Also I would note, that we did not have this issue on 1.27, this issue seems to be present on 1.28 only.
Hello, this issue keeps coming for us. I have zero idea what could be wrong. The issue comes on already existing PVCs that had no issues in the past.
This part is confirmed, BLKID returns an empty string:
I0517 06:56:08.200257 1 mount_linux.go:579] Attempting to determine if disk "/dev/disk/azure/scsi1/lun5" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/disk/azure/scsi1/lun5])
I0517 06:56:08.289216 1 mount_linux.go:582] Output: ""
blkid is returning an error code 2. Adding some debug output:
[root@azure-madme-prod-k8s-node-2 ~]# LIBBLKID_DEBUG=all blkid -p -s TYPE -s PTTYPE -o export /dev/disk/azure/scsi1/lun5
1422445: libblkid: INIT: library debug mask: 0xffff
1422445: libblkid: INIT: library version: 2.32.1 [16-Jul-2018]
Available "LIBBLKID_DEBUG=<name>[,...]|<mask>" debug masks:
all [0xffff] : info about all subsystems
cache [0x0004] : blkid tags cache
config [0x0008] : config file utils
dev [0x0010] : device utils
devname [0x0020] : /proc/partitions evaluation
devno [0x0040] : conversions to device name
evaluate [0x0080] : tags resolving
help [0x0001] : this help
lowprobe [0x0100] : superblock/raids/partitions probing
buffer [0x2000] : low-probing buffers
probe [0x0200] : devices verification
read [0x0400] : cache parsing
save [0x0800] : cache writing
tag [0x1000] : tags utils
1422445: libblkid: LOWPROBE: allocate a new probe
1422445: libblkid: LOWPROBE: zeroize wiper
1422445: libblkid: LOWPROBE: ready for low-probing, offset=0, size=34359738368
1422445: libblkid: LOWPROBE: whole-disk: YES, regfile: NO
1422445: libblkid: LOWPROBE: start probe
1422445: libblkid: LOWPROBE: zeroize wiper
1422445: libblkid: LOWPROBE: chain safeprobe superblocks ENABLED
1422445: libblkid: LOWPROBE: --> starting probing loop [SUBLKS idx=-1]
1422445: libblkid: LOWPROBE: [0] linux_raid_member:
1422445: libblkid: LOWPROBE: call probefunc()
1422445: libblkid: LOWPROBE: read: off=34359672832 len=64
1422445: libblkid: LOWPROBE: read failed: Input/output error
1422445: libblkid: LOWPROBE: <-- leaving probing loop (failed=-5) [SUBLKS idx=0]
1422445: libblkid: LOWPROBE: freeing values list
1422445: libblkid: LOWPROBE: end probe
1422445: libblkid: LOWPROBE: zeroize wiper
1422445: libblkid: LOWPROBE: free probe
I was able to fix the issue by unmounting the disk in azure console and deleting the volumeattachment in kube. After this, the very same PVC is mounted again on the node with no issues. Anyone has any idea what could be causing this issue? Seems to me like an azure issue, although this issue started coming once we upgraded kube to 1.28 including the azuredisk driver.
Thanks
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
CSI azure disk driver has occasional issue mounting EXT4 type filesystems on an existing PVC. The existing PVC is already formatted and mounts with no issues on a different node. This issue comes randomly and rebooting the node fixes the issue.
Logs from csi-azuredisk-node:
OS Almalinux:
Kubernetes version: 1.28.6
CSI driver version:
CSI driver and Storageclass manifests:
PVC manifest:
Anyone ever seen this kind of issue? Seems very strange and very random. Seems like the CSI driver cant detect the ext4 filesystem present and tries to format it.