storaged-project / blivet

A python module for configuration of block devices
GNU Lesser General Public License v2.1
107 stars 86 forks source link

LVM resize fails with calculation error - pv metadata unaccounted for #1320

Open plimptm opened 3 days ago

plimptm commented 3 days ago

I have been troubleshooting the use of this library in the ansible linux-system-roles collection and have traced an issue with the resizing capabilities to what I believe to be a size calculation bug in blivet. I searched the issues and didn't see anything related, so I'm opening this here.

Note: I first encountered this bug while trying to use blivet to resize a logical volume (after resizing the underlying disk) but have been able to reproduce the issue by simply calling the LVMLogicalVolumeDevice.resize() method even when there is no extra space available.

Expected behaviors:

Actual behavior:

>>> b = blivet.Blivet()
>>> lv = b.lvs[9]
>>> lv
LVMLogicalVolumeDevice instance (0x7f630064f880) --
  name = vg01-lv_app01  status = True  id = 354
  children = [] 
  parents = ['existing 10 GiB lvmvg vg01 (350)']
  uuid = Kgmu9m-dPFg-PdEJ-6KmH-5FPs-CRBx-Q2lVxa  size = 9.99 GiB
  format = existing ext4 filesystem
  major = 0  minor = 0  exists = True  protected = False
  sysfs path = /sys/devices/virtual/block/dm-9
  target size = 9.99 GiB  path = /dev/mapper/vg01-lv_app01
  format args = []  original_format = ext4  target = None  dm_uuid = None  VG device = LVMVolumeGroupDevice instance (0x7f62ff5018b0) --
  name = vg01  status = True  id = 350
  children = ['existing 9.99 GiB lvmlv vg01-lv_app01 (354) with existing ext4 filesystem']
  parents = ['existing 10 GiB disk sdb (341) with existing lvmpv']
  uuid = 9pDOvI-CXzn-0Xce-XRka-0TtE-Ut7E-1qJNFE  size = 10 GiB  
  format = existing None
  major = 0  minor = 0  exists = True  protected = False
  sysfs path =
  target size = 9.99 GiB  path = /dev/vg01
  format args = []  original_format = None  free = 0 B  PE Size = 4 MiB  PE Count = 2558
  PE Free = 0  PV Count = 1
  modified = False  extents = 2559  free space = 4 MiB
  free extents = 1  reserved percent = 0  reserved space = 0 B
  PVs = ['existing 10 GiB disk sdb (341) with existing lvmpv']
  LVs = ['existing 9.99 GiB lvmlv vg01-lv_app01 (354) with existing ext4 filesystem']
  segment type = linear percent = 0
  VG space used = 9.99 GiB
>>> lv.vg.size
Size (9.99609375 GiB)
>>> lv.size
Size (9.9921875 GiB)
>>> lv.size == lv.vg.size 
False
>>> lv.size = lv.vg.size
>>> lv.resize()
Traceback (most recent call last):
  File "/usr/lib64/python3.9/site-packages/gi/overrides/BlockDev.py", line 1092, in wrapped
    ret = orig_obj(*args, **kwargs)
  File "/usr/lib64/python3.9/site-packages/gi/overrides/BlockDev.py", line 605, in lvm_lvresize
    return _lvm_lvresize(vg_name, lv_name, size, extra)
gi.repository.GLib.GError: g-bd-utils-exec-error-quark: Process reported exit code 5:   Insufficient free space: 1 extents needed, but only 0 available
 (0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/site-packages/blivet/threads.py", line 53, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/blivet/devices/lvm.py", line 2513, in decorated
    return meth(self, *args, **kwargs)  # pylint: disable=not-callable
  File "/usr/lib/python3.9/site-packages/blivet/devices/lvm.py", line 2773, in resize
    blockdev.lvm.lvresize(self.vg.name, self._name, self.size)
  File "/usr/lib64/python3.9/site-packages/gi/overrides/BlockDev.py", line 1114, in wrapped
    raise transform[1](msg)
gi.overrides.BlockDev.LVMError: Process reported exit code 5:   Insufficient free space: 1 extents needed, but only 0 available

LVM specs:

# pvdisplay /dev/sdb
  --- Physical volume ---
  PV Name               /dev/sdb
  VG Name               vg01
  PV Size               <10.00 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              2558
  Free PE               0
  Allocated PE          2558
  PV UUID               m3yU1i-MHFE-yAkj-B7hu-gTfG-bhGq-IFv8rl

# vgdisplay vg01
  --- Volume group ---
  VG Name               vg01
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               9.99 GiB
  PE Size               4.00 MiB
  Total PE              2558
  Alloc PE / Size       2558 / 9.99 GiB
  Free  PE / Size       0 / 0
  VG UUID               9pDOvI-CXzn-0Xce-XRka-0TtE-Ut7E-1qJNFE

Server OS: RHEL 9.2

plimptm commented 3 days ago

I've noticed an inconsistency in how blivet reports the size of the underlying pv.

The blivet vg size property appears to reach straight back to read the size of the underlying block device read_current_size and subtracts only the volume group metadata, completely ignoring the pv metadata (and any unusable blocks)

If I subtract a total of 8 MiB (4 MiB for pv metadata + 4 MiB for vg metadata) from 10 GiB, the math adds up:

>>> pv.size - vg.lvm_metadata_space*2
Size (9.9921875 GiB)
>>> pv.size - vg.lvm_metadata_space*2 == lv.size
True

I would expect pvs to be handled by blivet with similar properties to other lvm objects, with an accounting for metadata, but it appears they are handled as DiskDevice type objects.

>>> vg.lvm_metadata_space
Size (4 MiB)
>>> pv.lvm_metadata_space
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'DiskDevice' object has no attribute 'lvm_metadata_space'
vojtechtrefny commented 5 hours ago

Hi, thank you for the report. First of all I am not saying there is no bug the code or that our calculation of metadata sizes in LVM is perfect. I know it's not and we had more than a few issues with it in the past.

There is one problem with your example -- you are not supposed to call the resize function directly, but either use the Blivet.resize_device helper function or register the ActionResizeDevice action directly (unfortunately our API is not very user friendly and our documentation is not the best in the world). Using the actions might prevent some of the problems you see (there are some additional fail safe mechanisms in the actions that sometimes fix/hide problems from the lower layer).

Expected behaviors:

* calling the `resize` method with unallocated PEs in the parent volume group will expand the logical volume to fill those unallocated PEs

* calling the `resize` method with no unallocated PEs in the parent volume group will see that the logical volume and volume groups sizes already match and return something indicating there are no available extents

That's not how the resize function works. resize will simply try to resize the device to its target_size. Which for existing devices is just their size.

The reason for this is that resize is a generic function that works the same for all devices -- so resize for partition and LV is the same function that (in general) is called when processing the ActionResizeDevice action.

The question is why resize actually tried to resize the LV, it shouldn't do that by itself, I suspect the reason is some rounding and alignment for PE size we do and that we actually rounded the size up somewhere resulting in trying to resize the LV.

The blivet vg size property appears to reach straight back to read the size of the underlying block device read_current_size and subtracts only the volume group metadata, completely ignoring the pv metadata (and any unusable blocks)

In our code there is a difference between size and free space. For VG the size is the sum of the PV sizes minus the PV metadata calculated in lvm_metadata_space. The VG metadata (including things like pmspare) and other unusable space is accounted for in free_space (for us VG metadata is still part of VG so part of its size) so free_space should be used when resizing (or to be more precise LV max_size should be used when checking if the LV can be resized).

I would expect pvs to be handled by blivet with similar properties to other lvm objects, with an accounting for metadata, but it appears they are handled as DiskDevice type objects.

Because the device itself is disk, the format is LVM PV so for LVM PV specific values you need to look at pv.format.

The low level API in blivet (especially for LVM) is horribly complicated. The upper level API (helper functions in the Blivet object) hide most of these things and I definitely do not recommend using the low level API to anyone.


But all the complicated explanations above aside -- we definitely do have a bug somewhere, because we think there is one free extent in your VG (free space = 4 MiB free extents = 1) when there isn't one (Free PE / Size 0 / 0). So that's definitely not right.

Can you please share the entire blivet log from your system? If you run the code below it will save the log to /tmp/blivet.log:

from blivet import util
util.set_up_logging()
import blivet; b=blivet.Blivet(); b.reset()

if you don't want to share the entire log publicly you can send it to me to vtrefny AT redhat.com, thank you.