Open kundeng opened 1 month ago
Hi @kundeng, could you please run this command and share the output with us?
$ lsblk --paths --json -o NAME,ROTA,TYPE,SIZE,MODEL,VENDOR,RO,STATE,KNAME,SERIAL,PARTLABEL,FSTYPE
LVMS uses this command and applies some filters to determine the available devices that can be used. Also, please specify which OCP and LVMS versions you are using.
sh-5.1# lsblk --paths --json -o NAME,ROTA,TYPE,SIZE,MODEL,VENDOR,RO,STATE,KNAME,SERIAL,PARTLABEL,FSTYPE
{
"blockdevices": [
{
"name": "/dev/sda",
"rota": false,
"type": "disk",
"size": "200G",
"model": "Virtual disk",
"vendor": "VMware ",
"ro": false,
"state": "running",
"kname": "/dev/sda",
"serial": "6000c29f5c2137678321ce187f1f9eca",
"partlabel": null,
"fstype": "LVM2_member"
},{
"name": "/dev/sdb",
"rota": false,
"type": "disk",
"size": "100G",
"model": "Virtual disk",
"vendor": "VMware ",
"ro": false,
"state": "running",
"kname": "/dev/sdb",
"serial": "6000c29c706df277e8eb89db0720dc4b",
"partlabel": null,
"fstype": null,
"children": [
{
"name": "/dev/sdb1",
"rota": false,
"type": "part",
"size": "1M",
"model": null,
"vendor": null,
"ro": false,
"state": null,
"kname": "/dev/sdb1",
"serial": null,
"partlabel": "BIOS-BOOT",
"fstype": null
},{
"name": "/dev/sdb2",
"rota": false,
"type": "part",
"size": "127M",
"model": null,
"vendor": null,
"ro": false,
"state": null,
"kname": "/dev/sdb2",
"serial": null,
"partlabel": "EFI-SYSTEM",
"fstype": "vfat"
},{
"name": "/dev/sdb3",
"rota": false,
"type": "part",
"size": "384M",
"model": null,
"vendor": null,
"ro": false,
"state": null,
"kname": "/dev/sdb3",
"serial": null,
"partlabel": "boot",
"fstype": "ext4"
},{
"name": "/dev/sdb4",
"rota": false,
"type": "part",
"size": "99.5G",
"model": null,
"vendor": null,
"ro": false,
"state": null,
"kname": "/dev/sdb4",
"serial": null,
"partlabel": "root",
"fstype": "xfs"
}
]
},{
"name": "/dev/sr0",
"rota": true,
"type": "rom",
"size": "1024M",
"model": "VMware Virtual SATA CDRW Drive",
"vendor": "NECVMWar",
"ro": false,
"state": "running",
"kname": "/dev/sr0",
"serial": "00000000000000000001",
"partlabel": null,
"fstype": null
}
]
}
sh-5.1# chroot /host
sh-5.1# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 200G 0 disk
sdb 8:16 0 100G 0 disk
|-sdb1
| 8:17 0 1M 0 part
|-sdb2
| 8:18 0 127M 0 part
|-sdb3
| 8:19 0 384M 0 part /boot
`-sdb4
8:20 0 99.5G 0 part /var/lib/kubelet/pods/021e4599-8340-4ebd-a8f2-21d481bd3b12/volume-subpaths/nginx-conf/monitoring-plugin/1
/var/lib/kubelet/pods/2e319ce4-9e50-49bc-a7b9-b0c0372f7780/volume-subpaths/nginx-conf/networking-console-plugin/1
/var
/sysroot/ostree/deploy/rhcos/var
/sysroot
/usr
/etc
/
sr0 11:0 1 1024M 0 rom
The boot disk has changed to /dev/sdb. Attempting recovery again confirms that this issue can be replicated. It appears that fully shutting down the VM and then restarting it occasionally reverses the disk order to the original names.
But even after that, deployments are unable to recover from the issue:
[image-registry-7d55768458-q4cj6](https://console-openshift-console.apps.devs.bayeslearner.org/k8s/ns/openshift-image-registry/pods/image-registry-7d55768458-q4cj6)
NamespaceNS[openshift-image-registry](https://console-openshift-console.apps.devs.bayeslearner.org/k8s/cluster/namespaces/openshift-image-registry)
Oct 30, 2024, 11:16 PM
Generated from kubelet on [00-50-56-b3-97-55](https://console-openshift-console.apps.devs.bayeslearner.org/k8s/cluster/nodes/00-50-56-b3-97-55)
13 times in the last 23 minutes
MountVolume.SetUp failed for volume "pvc-3f16ebc2-9420-4169-9aba-5c05856d0f43" : rpc error: code = Internal desc = failed to list LV: rpc error: code = NotFound desc = not found exit status 5: Volume group "vg1" not found Cannot process volume group vg1: vg1
sh-5.1# chroot /host
sh-5.1# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 100G 0 disk
|-sda1
| 8:1 0 1M 0 part
|-sda2
| 8:2 0 127M 0 part
|-sda3
| 8:3 0 384M 0 part /boot
`-sda4
8:4 0 99.5G 0 part /var/lib/kubelet/pods/021e4599-8340-4ebd-a8f2-21d481bd3b12/volume-subpaths/nginx-conf/monitoring-plugin/1
/var/lib/kubelet/pods/2e319ce4-9e50-49bc-a7b9-b0c0372f7780/volume-subpaths/nginx-conf/networking-console-plugin/1
/var
/sysroot/ostree/deploy/rhcos/var
/sysroot
/usr
/etc
/
sdb 8:16 0 200G 0 disk
sr0 11:0 1 1024M 0 rom
The LV group seems to have vanished:
sh-5.1# vgs
sh-5.1# vgdisplay
sh-5.1# lvdisplay
sh-5.1# vgscan
sh-5.1# vgs
sh-5.1#
sh-5.1# lvmdiskscan
0 disks
0 partitions
0 LVM physical volume whole disks
0 LVM physical volumes
sh-5.1# fdisk -l
Disk /dev/sdb: 200 GiB, 214748364800 bytes, 419430400 sectors
Disk model: Virtual disk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sda: 100 GiB, 107374182400 bytes, 209715200 sectors
Disk model: Virtual disk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 03A710CC-531D-4305-8D82-90D7B15AA5A6
Device Start End Sectors Size Type
/dev/sda1 2048 4095 2048 1M BIOS boot
/dev/sda2 4096 264191 260096 127M EFI System
/dev/sda3 264192 1050623 786432 384M Linux filesystem
/dev/sda4 1050624 209715166 208664543 99.5G Linux filesystem
This is a RHEL9 issue that we have no control over. The recommended workaround for LVMS is to specify devices either as paths
or optionalPaths
in deviceSelector
as explained here. It is important to use /dev/disk/by-path/....
references instead of /dev/sdX
paths not to be affected from this issue.
Title: LVM Operator Attempts to Use Boot Disk for Volume Group After vSphere Restoration of OpenShift Cluster
Description:
After restoring an OpenShift cluster from a vSphere backup, the LVM Operator incorrectly targets the boot disk (
/dev/sda
) as part of the LVM volume group (VG). This causes the operator to fail in initializing the VG, blocking the deployment of storage-backed workloads. Additionally, the intended storage disk (/dev/sdb
) is excluded due to partition signature and label conflicts.The disk ordering seems to have changed after the vSphere restore, causing the boot disk to be recognized as a valid candidate for the VG. This behavior indicates that the LVM operator does not correctly exclude system disks and partitions from its selection process.
Steps to Reproduce:
/dev/sda
: Boot disk (should not be part of VG)./dev/sdb
: Intended storage disk (excluded by operator)./dev/sda
for VG creation.Expected Behavior:
/dev/sda
) from the VG./dev/sdb
) should be automatically selected and initialized for use in the VG.Actual Behavior:
The LVM operator incorrectly tries to use
/dev/sda
for the VG:/dev/sdb
is excluded by the operator with errors:Relevant Logs:
From the LVM operator:
Impact:
This issue blocks dynamic storage provisioning for workloads that rely on LVM-based storage. The incorrect usage of the boot disk also risks data loss or node instability, potentially impacting the entire cluster’s availability.
Environment:
lvms-vg1
Workaround Attempts:
/dev/sda
by patching theLVMCluster
configuration./dev/sdb
and tried to initialize it as a physical volume, but it remained excluded due to partition signature errors.Suggested Fix:
LVMCluster
configuration.