lvmteam / lvm2

Mirror of upstream LVM2 repository
https://gitlab.com/lvmteam/lvm2
GNU General Public License v2.0
133 stars 73 forks source link

vgchange -ay <vg> doesn't activate VG #152

Closed UserAlexUser closed 3 months ago

UserAlexUser commented 3 months ago

vgchange.log image

Hello! I caught such a problem, after importing pool, I run vgchange --ay, it can't activate volumes with error

2024-08-01 23:14:03.489 vgchange -ay lpol
2024-08-01 23:14:05.111 0 logical volume(s) in volume group "lpol" now active  /dev/mapper/lpol-thin_vg_tmeta: open failed: No such file or directory  /dev/mapper/lpol-thin_vg_tmeta: open failed: No such file or directory
zkabelac commented 3 months ago

hi - from log it looks like there is some failure while activating _tmeta device and thin_check cannot successfully run.

Howerver there is no 'dmesg' log from the same moment - so it cannot be see what could be the reason of failure.

Possibly also thin_check may fail on it's own.

Try to activate _tmeta as 'component' activation and run thin_check and see the output.

Xerrial commented 3 months ago

Hello @zkabelac ! I've been working on the same issue.

Here's a log of lvchange -ay r5/thin_vg -vvvv with dmesg -w running in the background. There's also some additional infomation like lvs and vgs which might be helpful. lvchange_thin_pool_with_dmesg.log

Some additional info about the issue:

Also I activated _tmeta and run thin_check like you advised, hopefully it clarifies something:

[Tue Aug 06 17:38:37 @ ~]:> lvchange -ay /dev/r5/thin_vg_tmeta
Do you want to activate component LV in read-only mode? [y/n]: y
  Allowing activation of component LV.
[15142.485320] md/raid:mdX: device dm-1 operational as raid disk 0
[15142.485327] md/raid:mdX: device dm-3 operational as raid disk 1
[15142.485329] md/raid:mdX: device dm-5 operational as raid disk 2
[15142.485331] md/raid:mdX: device dm-7 operational as raid disk 3
[15142.485332] md/raid:mdX: device dm-9 operational as raid disk 4
[15142.485333] md/raid:mdX: device dm-11 operational as raid disk 5
[15142.485335] md/raid:mdX: device dm-13 operational as raid disk 6
[15142.485336] md/raid:mdX: device dm-15 operational as raid disk 7
[15142.485338] md/raid:mdX: device dm-17 operational as raid disk 8
[15142.485339] md/raid:mdX: device dm-19 operational as raid disk 9
[15142.493468] md/raid:mdX: raid level 5 active with 10 out of 10 devices, algorithm 2
[Tue Aug 06 17:42:11 @ ~]:> thin_check /dev/r5/thin_vg_tmeta
examining superblock
TRANSACTION_ID=2
METADATA_FREE_BLOCKS=3935231
examining devices tree
examining mapping tree
checking space map counts
[Tue Aug 06 17:42:28 @ ~]:> 

Just like I said, I can reproduce the issue and gather additional info if necessary.

zkabelac commented 3 months ago

Since you are active in other issue - I'm getting some feeling your udev system configuration or lvm2 build is possibly invalid.

The logged error suggests that /dev/mapper/r5-thin_vg_tmeta symlink is missing in the moment thin_check is supposed to check this device.

So let's just recheck you are building your lvm2 with 'configure --enable-udev_sync' option (which is mandatory for properly working udev synchronization.

Also make sure you udev rules.d directory contains properly installed udev rules from lvm2 project.

In case you run your tools with badly working udev - feel free to use 'verify_udev_operations=1' - you likely cannot mess your system with this setting any more....

Xerrial commented 3 months ago

Our build configuration includes '--enable-udev_sync':

[Wed Aug 07 14:52:14 @ ~]:> lvs --version 
  LVM version:     2.03.11(2) (2021-01-08)
  Library version: 1.02.175 (2021-01-08)
  Driver version:  4.47.0
  Configuration:   ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --with-udev-prefix=/ --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-editline --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-udev_rules --enable-udev_sync --disable-readline --with-vdo=internal --with-writecache=internal

Anyways, looks like verify_udev_operations=1 helps. However, looking at the lvmconfig comment this option appears to be for the debugging purposes. Is it ok to use this option in production, at least as a workaround? Are there any drawbacks, something to be aware of?

zkabelac commented 3 months ago

Anyways, looks like verify_udev_operations=1 helps. However, looking at the lvmconfig comment this option appears to be for the debugging purposes. Is it ok to use this option in production, at least as a workaround? Are there any drawbacks, something to be aware of?

Well you should figure out why your udev is malfunctioning - your system is not working correctly and lvm2 cannot correctly synchronize with udev (in this case it cannot wait for udev to create symlink to a device that is used for accessing _tmeta content).

I'd say that using production system with misbehaving udev would be seen as a major bug, but what do I know....

Keeping this workaround enabled basically means lvm2 will interfere with running udev (if there is one) - and it will also slightly slow down command execution due to the symlink handling and validation - but that's rather a minor issue compared the the one mentioned above....

It's debug feature, because in general it's able to 'very well mask' the misconfigured udev - and that's a bad idea overall - as tools in such system will see a different set of devices....

zkabelac commented 3 months ago

Assuming issue was related to some udev problems (possibly udev was even not running ??) Closing issue since it appears there is no issue within lvm2.