openshift / lvm-operator

The LVM Operator deploys and manages LVM storage on OpenShift clusters
Apache License 2.0
44 stars 40 forks source link

LVM Operator Attempts to Use Boot Disk for Volume Group in OpenShift Cluster #776

Open kundeng opened 1 month ago

kundeng commented 1 month ago

Title: LVM Operator Attempts to Use Boot Disk for Volume Group After vSphere Restoration of OpenShift Cluster

Description:

After restoring an OpenShift cluster from a vSphere backup, the LVM Operator incorrectly targets the boot disk (/dev/sda) as part of the LVM volume group (VG). This causes the operator to fail in initializing the VG, blocking the deployment of storage-backed workloads. Additionally, the intended storage disk (/dev/sdb) is excluded due to partition signature and label conflicts.

The disk ordering seems to have changed after the vSphere restore, causing the boot disk to be recognized as a valid candidate for the VG. This behavior indicates that the LVM operator does not correctly exclude system disks and partitions from its selection process.

Steps to Reproduce:

  1. Perform a vSphere restore of an OpenShift cluster.
  2. Ensure the cluster node contains:
    • /dev/sda: Boot disk (should not be part of VG).
    • /dev/sdb: Intended storage disk (excluded by operator).
  3. Observe the LVM operator’s behavior and its attempt to use /dev/sda for VG creation.

Expected Behavior:

Actual Behavior:

Relevant Logs:

From the LVM operator:

failed to create/extend volume group vg1: failed to create volume group vg1:
failed to create volume group "vg1". exit status 5: Physical volume '/dev/sda' 
is already in volume group 'vg1'. Unable to add physical volume '/dev/sda' to 
volume group 'vg1'. /dev/sda: physical volume not initialized.

Impact:

This issue blocks dynamic storage provisioning for workloads that rely on LVM-based storage. The incorrect usage of the boot disk also risks data loss or node instability, potentially impacting the entire cluster’s availability.

Environment:

Workaround Attempts:

Suggested Fix:

  1. Improve device detection logic to exclude boot/system disks automatically (e.g., based on partition labels or mounted root partitions).
  2. Provide an option to explicitly exclude devices via the LVMCluster configuration.
  3. Implement more robust logging and error handling to help diagnose disk conflicts.
  4. Ensure the LVM operator handles disk reordering gracefully after VM restores or node migrations, such as those caused by vSphere snapshots.
kundeng commented 1 month ago

see https://access.redhat.com/solutions/7015011

suleymanakbas91 commented 1 month ago

Hi @kundeng, could you please run this command and share the output with us?

$ lsblk --paths --json -o NAME,ROTA,TYPE,SIZE,MODEL,VENDOR,RO,STATE,KNAME,SERIAL,PARTLABEL,FSTYPE

LVMS uses this command and applies some filters to determine the available devices that can be used. Also, please specify which OCP and LVMS versions you are using.

kundeng commented 4 weeks ago
sh-5.1# lsblk --paths --json -o NAME,ROTA,TYPE,SIZE,MODEL,VENDOR,RO,STATE,KNAME,SERIAL,PARTLABEL,FSTYPE
{
   "blockdevices": [
      {
         "name": "/dev/sda",
         "rota": false,
         "type": "disk",
         "size": "200G",
         "model": "Virtual disk",
         "vendor": "VMware  ",
         "ro": false,
         "state": "running",
         "kname": "/dev/sda",
         "serial": "6000c29f5c2137678321ce187f1f9eca",
         "partlabel": null,
         "fstype": "LVM2_member"
      },{
         "name": "/dev/sdb",
         "rota": false,
         "type": "disk",
         "size": "100G",
         "model": "Virtual disk",
         "vendor": "VMware  ",
         "ro": false,
         "state": "running",
         "kname": "/dev/sdb",
         "serial": "6000c29c706df277e8eb89db0720dc4b",
         "partlabel": null,
         "fstype": null,
         "children": [
            {
               "name": "/dev/sdb1",
               "rota": false,
               "type": "part",
               "size": "1M",
               "model": null,
               "vendor": null,
               "ro": false,
               "state": null,
               "kname": "/dev/sdb1",
               "serial": null,
               "partlabel": "BIOS-BOOT",
               "fstype": null
            },{
               "name": "/dev/sdb2",
               "rota": false,
               "type": "part",
               "size": "127M",
               "model": null,
               "vendor": null,
               "ro": false,
               "state": null,
               "kname": "/dev/sdb2",
               "serial": null,
               "partlabel": "EFI-SYSTEM",
               "fstype": "vfat"
            },{
               "name": "/dev/sdb3",
               "rota": false,
               "type": "part",
               "size": "384M",
               "model": null,
               "vendor": null,
               "ro": false,
               "state": null,
               "kname": "/dev/sdb3",
               "serial": null,
               "partlabel": "boot",
               "fstype": "ext4"
            },{
               "name": "/dev/sdb4",
               "rota": false,
               "type": "part",
               "size": "99.5G",
               "model": null,
               "vendor": null,
               "ro": false,
               "state": null,
               "kname": "/dev/sdb4",
               "serial": null,
               "partlabel": "root",
               "fstype": "xfs"
            }
         ]
      },{
         "name": "/dev/sr0",
         "rota": true,
         "type": "rom",
         "size": "1024M",
         "model": "VMware Virtual SATA CDRW Drive",
         "vendor": "NECVMWar",
         "ro": false,
         "state": "running",
         "kname": "/dev/sr0",
         "serial": "00000000000000000001",
         "partlabel": null,
         "fstype": null
      }
   ]
}
kundeng commented 4 weeks ago
sh-5.1# chroot /host
sh-5.1# lsblk
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda    8:0    0  200G  0 disk 
sdb    8:16   0  100G  0 disk 
|-sdb1
|      8:17   0    1M  0 part 
|-sdb2
|      8:18   0  127M  0 part 
|-sdb3
|      8:19   0  384M  0 part /boot
`-sdb4
       8:20   0 99.5G  0 part /var/lib/kubelet/pods/021e4599-8340-4ebd-a8f2-21d481bd3b12/volume-subpaths/nginx-conf/monitoring-plugin/1
                              /var/lib/kubelet/pods/2e319ce4-9e50-49bc-a7b9-b0c0372f7780/volume-subpaths/nginx-conf/networking-console-plugin/1
                              /var
                              /sysroot/ostree/deploy/rhcos/var
                              /sysroot
                              /usr
                              /etc
                              /
sr0   11:0    1 1024M  0 rom  
kundeng commented 4 weeks ago

The boot disk has changed to /dev/sdb. Attempting recovery again confirms that this issue can be replicated. It appears that fully shutting down the VM and then restarting it occasionally reverses the disk order to the original names.

But even after that, deployments are unable to recover from the issue:

[image-registry-7d55768458-q4cj6](https://console-openshift-console.apps.devs.bayeslearner.org/k8s/ns/openshift-image-registry/pods/image-registry-7d55768458-q4cj6)
NamespaceNS[openshift-image-registry](https://console-openshift-console.apps.devs.bayeslearner.org/k8s/cluster/namespaces/openshift-image-registry)
Oct 30, 2024, 11:16 PM
Generated from kubelet on [00-50-56-b3-97-55](https://console-openshift-console.apps.devs.bayeslearner.org/k8s/cluster/nodes/00-50-56-b3-97-55)
13 times in the last 23 minutes
MountVolume.SetUp failed for volume "pvc-3f16ebc2-9420-4169-9aba-5c05856d0f43" : rpc error: code = Internal desc = failed to list LV: rpc error: code = NotFound desc = not found exit status 5: Volume group "vg1" not found Cannot process volume group vg1: vg1
sh-5.1# chroot /host
sh-5.1# lsblk
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda    8:0    0  100G  0 disk 
|-sda1
|      8:1    0    1M  0 part 
|-sda2
|      8:2    0  127M  0 part 
|-sda3
|      8:3    0  384M  0 part /boot
`-sda4
       8:4    0 99.5G  0 part /var/lib/kubelet/pods/021e4599-8340-4ebd-a8f2-21d481bd3b12/volume-subpaths/nginx-conf/monitoring-plugin/1
                              /var/lib/kubelet/pods/2e319ce4-9e50-49bc-a7b9-b0c0372f7780/volume-subpaths/nginx-conf/networking-console-plugin/1
                              /var
                              /sysroot/ostree/deploy/rhcos/var
                              /sysroot
                              /usr
                              /etc
                              /
sdb    8:16   0  200G  0 disk 
sr0   11:0    1 1024M  0 rom  
kundeng commented 4 weeks ago

The LV group seems to have vanished:

sh-5.1# vgs
sh-5.1# vgdisplay 
sh-5.1# lvdisplay 
sh-5.1# vgscan
sh-5.1# vgs
sh-5.1# 

sh-5.1# lvmdiskscan
  0 disks
  0 partitions
  0 LVM physical volume whole disks
  0 LVM physical volumes
sh-5.1# fdisk -l
Disk /dev/sdb: 200 GiB, 214748364800 bytes, 419430400 sectors
Disk model: Virtual disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/sda: 100 GiB, 107374182400 bytes, 209715200 sectors
Disk model: Virtual disk    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 03A710CC-531D-4305-8D82-90D7B15AA5A6

Device       Start       End   Sectors  Size Type
/dev/sda1     2048      4095      2048    1M BIOS boot
/dev/sda2     4096    264191    260096  127M EFI System
/dev/sda3   264192   1050623    786432  384M Linux filesystem
/dev/sda4  1050624 209715166 208664543 99.5G Linux filesystem
suleymanakbas91 commented 3 weeks ago

This is a RHEL9 issue that we have no control over. The recommended workaround for LVMS is to specify devices either as paths or optionalPaths in deviceSelector as explained here. It is important to use /dev/disk/by-path/.... references instead of /dev/sdX paths not to be affected from this issue.