Closed kevinwd closed 3 months ago
FWIW, I had the same today. I split up root in root, home and var (with different raid options), and var never comes up automatically, while home is fine. A vgchange fixes it but currently - until I have time for a deep dive - I can only boot through a rescue boot and this manual action.
Ubuntu 20.04, lvm versions: ```$ dpkg -l|grep -i lvm2 ii liblvm2cmd2.03:amd64 2.03.07-1ubuntu1 amd64 LVM2 command library ii lvm2 2.03.07-1ubuntu1 amd64 Linux Logical Volume Manager
Affected here as well on Arch on version 2.03.11.
I cannot boot without manual intervention.
I run a single PV (an md device, which is assembled fine during boot), a single VG, and four LVs:
# pvdisplay
--- Physical volume ---
PV Name /dev/md126
VG Name vg_md_data
PV Size <8.19 TiB / not usable 6.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 2146137
Free PE 0
Allocated PE 2146137
PV UUID 39P9ip-784q-cbhx-x4Bd-jAUn-2aVS-OY8nkA
# vgdisplay
--- Volume group ---
VG Name vg_md_data
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 7
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 4
Open LV 4
Max PV 0
Cur PV 1
Act PV 1
VG Size <8.19 TiB
PE Size 4.00 MiB
Total PE 2146137
Alloc PE / Size 2146137 / <8.19 TiB
Free PE / Size 0 / 0
VG UUID njDiqo-D6qA-GxWl-rkVR-DzLd-5M8M-l21cvm
# lvdisplay
--- Logical volume ---
LV Path /dev/vg_md_data/data_lv_root
LV Name data_lv_root
VG Name vg_md_data
LV UUID goMGj1-Fxvi-0ub2-KOT2-mHe6-UnYj-fkKsQ1
LV Write Access read/write
LV Creation host, time [REDACTED.FQDN.TLD], 2021-01-05 23:07:04 -0500
LV Status available
# open 1
LV Size 10.00 GiB
Current LE 2560
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 254:0
--- Logical volume ---
LV Path /dev/vg_md_data/data_lv_home
LV Name data_lv_home
VG Name vg_md_data
LV UUID xpY3Jh-oDlC-toOQ-qI3B-u7r3-9Yfp-wrZCXn
LV Write Access read/write
LV Creation host, time [REDACTED.FQDN.TLD], 2021-01-05 23:07:29 -0500
LV Status available
# open 1
LV Size 3.01 TiB
Current LE 789504
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 254:1
--- Logical volume ---
LV Path /dev/vg_md_data/data_lv_var
LV Name data_lv_var
VG Name vg_md_data
LV UUID AT2OuK-HtFm-rB1n-ZrFF-afok-0GSw-nobY22
LV Write Access read/write
LV Creation host, time [REDACTED.FQDN.TLD], 2021-01-05 23:07:45 -0500
LV Status available
# open 1
LV Size 515.00 GiB
Current LE 131840
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 254:2
--- Logical volume ---
LV Path /dev/vg_md_data/data_lv_opt
LV Name data_lv_opt
VG Name vg_md_data
LV UUID 5stzDn-fRuK-04ib-71wN-EH5E-Hgdr-TmDeaS
LV Write Access read/write
LV Creation host, time [REDACTED.FQDN.TLD], 2021-01-05 23:08:30 -0500
LV Status available
# open 1
LV Size 4.66 TiB
Current LE 1222233
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 254:3
I always have a failed unit on boot, lvm2-pvscan@9:126.service
, for seemingly no reason:
# systemctl status lvm2-pvscan@9\:126.service
● lvm2-pvscan@9:126.service - LVM event activation on device 9:126
Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan@.service; static)
Active: failed (Result: signal) since Sat 2021-03-27 18:24:49 EDT; 51min ago
Docs: man:pvscan(8)
Main PID: 505 (code=killed, signal=TERM)
Mar 27 18:24:48 archlinux systemd[1]: Starting LVM event activation on device 9:126...
Mar 27 18:24:48 archlinux lvm[505]: Logging initialised at Sat Mar 27 22:24:48 2021
Mar 27 18:24:48 archlinux lvm[505]: Set umask from 0022 to 0077
Mar 27 18:24:48 archlinux lvm[505]: pvscan Creating directory "/run/lock/lvm"
Mar 27 18:24:48 archlinux lvm[505]: pvscan pvscan[505] PV /dev/md126 online, VG vg_md_data is complete.
Mar 27 18:24:48 archlinux lvm[505]: pvscan pvscan[505] VG vg_md_data run autoactivation.
Mar 27 18:24:48 archlinux lvm[505]: pvscan PVID 39P9ip-784q-cbhx-x4Bd-jAUn-2aVS-OY8nkA read from /dev/md126 last written to /dev/md127.
Mar 27 18:24:48 archlinux lvm[505]: pvscan pvscan[505] VG vg_md_data not using quick activation.
# journalctl --boot -u lvm2-pvscan@9\:126.service
-- Journal begins at Fri 2020-10-30 19:55:00 EDT, ends at Sat 2021-03-27 19:18:13 EDT. --
Mar 27 18:24:48 archlinux systemd[1]: Starting LVM event activation on device 9:126...
Mar 27 18:24:48 archlinux lvm[505]: Logging initialised at Sat Mar 27 22:24:48 2021
Mar 27 18:24:48 archlinux lvm[505]: Set umask from 0022 to 0077
Mar 27 18:24:48 archlinux lvm[505]: pvscan Creating directory "/run/lock/lvm"
Mar 27 18:24:48 archlinux lvm[505]: pvscan pvscan[505] PV /dev/md126 online, VG vg_md_data is complete.
Mar 27 18:24:48 archlinux lvm[505]: pvscan pvscan[505] VG vg_md_data run autoactivation.
Mar 27 18:24:48 archlinux lvm[505]: pvscan PVID 39P9ip-784q-cbhx-x4Bd-jAUn-2aVS-OY8nkA read from /dev/md126 last written to /dev/md127.
Mar 27 18:24:48 archlinux lvm[505]: pvscan pvscan[505] VG vg_md_data not using quick activation.
As shown, no apparent failures or errors, but the service is still marked as failed (presumably a timeout?).
Interestingly, that is the correct maj:min:
# lsblk /dev/md126
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
md126 9:126 0 8.2T 0 raid10
├─vg_md_data-data_lv_root 254:0 0 10G 0 lvm /root
├─vg_md_data-data_lv_home 254:1 0 3T 0 lvm /home
├─vg_md_data-data_lv_var 254:2 0 515G 0 lvm /var
└─vg_md_data-data_lv_opt 254:3 0 4.7T 0 lvm /opt
Any ideas, LVM team? This is driving me insane. I can't even get vgchange -a y
to work on boot (it finds my VG and LVs fine, but it never creates mappings for them in either /dev/mapper/
or /dev/<VG_name>/
- normally it does so in both) unless I explicitly assign my VG to auto_activation_volume_list
. This was all working fine a little less than a month ago.
Same here, randomly happen on boot. Sometimes everything is active, sometimes it is not. I still don't know how to solve it.
lvm version
LVM version: 2.03.07(2) (2019-11-30)
Library version: 1.02.167 (2019-11-30)
Driver version: 4.42.0
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
RAID5: /dev/md126 VG: vgdata LV: tp_data_pool (thin pool) LV: home_athena (on top of thin pool) LUKS encrypted file system
During boot, I can see the following messages:
Jun 02 22:59:44 kronos lvm[2130]: pvscan[2130] PV /dev/md126 online, VG vgdata is complete.
Jun 02 22:59:44 kronos lvm[2130]: pvscan[2130] VG vgdata skip autoactivation.
Then this:
Jun 02 22:59:44 kronos systemd[1]: Finished LVM event activation on device 253:0.
Jun 02 22:59:44 kronos systemd[1]: Mounted /boot/efi.
Jun 02 22:59:44 kronos systemd[1]: Finished LVM event activation on device 9:126.
Jun 02 23:00:14 kronos systemd[1]: systemd-fsckd.service: Succeeded.
Jun 02 23:01:10 kronos systemd[1]: dev-disk-by\x2duuid-5773b7da\x2d5b0f\x2d4347\x2db718\x2d377912a6209a.device: Job dev-disk-by\x2duuid-5773b7da\x2d5b0f\x2d4347\x2db718\x2d377912a6209a.device/start timed out.
The disk that times out corresponds to the UUID of the unencrypted file system: /dev/vgdata/home_athena
So the problem is that LVM is not activating the thin-pool and LVs.
After an unsuccessful boot, I see the thin-pool and LV inactive:
lvdisplay
--- Logical volume ---
LV Name tp_data_pool
VG Name vgdata
LV UUID 3Q4KuX-RoZn-nZ9n-Irk8-PG6Q-ChBu-QV86Fi
LV Write Access read/write
LV Creation host, time kronos, 2021-06-01 23:35:13 +0200
LV Pool metadata tp_data_pool_tmeta
LV Pool data tp_data_pool_tdata
LV Status NOT available
LV Size 27.65 TiB
Current LE 7249279
Segments 1
Allocation inherit
Read ahead sectors auto
--- Logical volume ---
LV Path /dev/vgdata/home_athena
LV Name home_athena
VG Name vgdata
LV UUID 3AwKjA-juIX-lGOB-uqqa-LO4i-5G0v-EMydMe
LV Write Access read/write
LV Creation host, time kronos, 2021-06-01 23:53:14 +0200
LV Pool name tp_data_pool
LV Status NOT available
LV Size 100.00 GiB
Current LE 25600
Segments 1
Allocation inherit
Read ahead sectors auto
--- Logical volume ---
LV Path /dev/vgxubuntu/root
LV Name root
VG Name vgxubuntu
LV UUID hseqPL-yxW0-3NBe-8SUc-zaX5-g7CI-LUUgmT
LV Write Access read/write
LV Creation host, time xubuntu, 2021-05-29 13:05:46 +0200
LV Status available
# open 1
LV Size 929.32 GiB
Current LE 237907
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:1
--- Logical volume ---
LV Path /dev/vgxubuntu/swap_1
LV Name swap_1
VG Name vgxubuntu
LV UUID TSy0SB-N0r8-TRrP-9BZE-7B6k-GeJ5-NGMXsg
LV Write Access read/write
LV Creation host, time xubuntu, 2021-05-29 13:05:46 +0200
LV Status available
# open 2
LV Size 976.00 MiB
Current LE 244
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:2
If I run lvchange -a y vgdata/tp_data_pool
, this activates the thin pool:
lvdisplay
--- Logical volume ---
LV Name tp_data_pool
VG Name vgdata
LV UUID 3Q4KuX-RoZn-nZ9n-Irk8-PG6Q-ChBu-QV86Fi
LV Write Access read/write (activated read only)
LV Creation host, time kronos, 2021-06-01 23:35:13 +0200
LV Pool metadata tp_data_pool_tmeta
LV Pool data tp_data_pool_tdata
LV Status available
# open 1
LV Size 27.65 TiB
Allocated pool data 0.01%
Allocated metadata 10.42%
Current LE 7249279
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 1024
Block device 253:5
--- Logical volume ---
LV Path /dev/vgdata/home_athena
LV Name home_athena
VG Name vgdata
LV UUID 3AwKjA-juIX-lGOB-uqqa-LO4i-5G0v-EMydMe
LV Write Access read/write
LV Creation host, time kronos, 2021-06-01 23:53:14 +0200
LV Pool name tp_data_pool
LV Status NOT available
LV Size 100.00 GiB
Current LE 25600
Segments 1
Allocation inherit
Read ahead sectors auto
Then I need to activate the other LV: lvchange -a y vgdata/home_athena
lvdisplay
--- Logical volume ---
LV Name tp_data_pool
VG Name vgdata
LV UUID 3Q4KuX-RoZn-nZ9n-Irk8-PG6Q-ChBu-QV86Fi
LV Write Access read/write (activated read only)
LV Creation host, time kronos, 2021-06-01 23:35:13 +0200
LV Pool metadata tp_data_pool_tmeta
LV Pool data tp_data_pool_tdata
LV Status available
# open 2
LV Size 27.65 TiB
Allocated pool data 0.01%
Allocated metadata 10.42%
Current LE 7249279
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 1024
Block device 253:5
--- Logical volume ---
LV Path /dev/vgdata/home_athena
LV Name home_athena
VG Name vgdata
LV UUID 3AwKjA-juIX-lGOB-uqqa-LO4i-5G0v-EMydMe
LV Write Access read/write
LV Creation host, time kronos, 2021-06-01 23:53:14 +0200
LV Pool name tp_data_pool
LV Status available
# open 1
LV Size 100.00 GiB
Mapped size 3.30%
Current LE 25600
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 1024
Block device 253:7
and finally deal with the encryption cryptdisks_start home_athena
.
Posible solution: Use event_activation=0
in lvm.conf
. At least after changing that parameter it worked, twice.
In each of the cases above, please collect the debug logging from the lvm commands. In the lvm.conf log{} section, set level=7 and file="/tmp/lvm.log" and send or post the log file for us to analyze. Or, add -vvvv (four v's) to the commands and collect the output. Thanks.
Is the output of lvmdump good enough? I can try to rollback the fix and run that command before starting the fix.
On Thu, 3 Jun 2021, 17:19 David Teigland, @.***> wrote:
In each of the cases above, please collect the debug logging from the lvm commands. In the lvm.conf log{} section, set level=7 and file="/tmp/lvm.log" and send or post the log file for us to analyze. Or, add -vvvv (four v's) to the commands and collect the output. Thanks.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lvmteam/lvm2/issues/29#issuecomment-853951246, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCJDA4FJAX4VHXV53IULCLTQ6MQHANCNFSM4M2CWM2Q .
Is the output of lvmdump good enough? I can try to rollback the fix and run that command before starting the fix.
no, we need debugging from the pvscan commands that are run by the lvm2-pvscan services.
I managed to generate log files for the 3 different boot logs:
with event_activation = 0
:
lvm_noevent.log
with event_activation = 1
and working properly:
lvm_event_working.log
with event_activation = 1
and not working:
lvm_event_broken.log
Let me know if I can test something else for you. This system is going on production on Monday and I won't be able to test more after that.
It appears that on your system the /run/lvm/ files may be persistent across boots, specifically the files in /run/lvm/pvs_online/ and /run/lvm/vgs_online/. For event-based autoactivation, pvscan requires that /run/lvm be cleared by reboot.
Having a similar issue. I had to replace the DIMMs in my server and then struggled to upgrade Ubuntu. Presently having an issue with one of the lvs not being active after a reboot.
Here is a boot log as requested above. lvm.log
Thank You!
Same situation going on Debian Bookworm
LVM version: 2.03.16(2) (2022-05-18)
Library version: 1.02.185 (2022-05-18)
Driver version: 4.47.0
Configuration: ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --with-udev-prefix=/ --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-editline --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-udev_rules --enable-udev_sync --disable-readline
journalctl
Jan 18 16:12:35 localhost.localdomain systemd[1]: Listening on lvm2-lvmpolld.socket - LVM2 poll daemon socket.
Jan 18 16:12:35 localhost.localdomain systemd[1]: Starting lvm2-monitor.service - Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Jan 18 16:12:35 localhost.localdomain lvm[479]: PV /dev/vdb3 online, VG os-vg is complete.
Jan 18 16:12:35 localhost.localdomain lvm[479]: VG os-vg finished
Jan 18 16:12:35 localhost.localdomain lvm[485]: PV /dev/vda3 online, VG os-vg is complete.
Jan 18 16:12:35 localhost.localdomain lvm[485]: VG os-vg finished
LVM SETUP
PV VG Fmt Attr PSize PFree
/dev/vda3 os-vg lvm2 a-- <15.00g 1.82g
/dev/vdb3 os-vg lvm2 a-- <15.00g 1.82g
root@live:~# vgs
VG #PV #LV #SN Attr VSize VFree
os-vg 2 1 0 wz--n- 29.99g <3.65g
root@live:~# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
os-lv os-vg rwi---r--- 13.00g
When I run update-grub
I get some strange output
Generating grub configuration file ...
error: unknown node 'os-lv_rimage_0'. (x12)
Found linux image: /boot/vmlinuz-6.1.0-17-amd64
Found initrd image: /boot/initrd.img-6.1.0-17-amd64
error: unknown node 'os-lv_rimage_0'. (x12)
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done
I have a couple of scripts to replicate the issue in a vm.
Doing raid0 works
Doing raid1 without --raidintegrity
works
Jan 18 16:12:35 localhost.localdomain lvm[479]: PV /dev/vdb3 online, VG os-vg is complete. Jan 18 16:12:35 localhost.localdomain lvm[479]: VG os-vg finished Jan 18 16:12:35 localhost.localdomain lvm[485]: PV /dev/vda3 online, VG os-vg is complete. Jan 18 16:12:35 localhost.localdomain lvm[485]: VG os-vg finished
That doesn't look right, maybe the /run/lvm files were not cleared as required (also mentioned this issue above.)
Wouldn't that impact raid1 and raid0 without raidintegrity?
The chroot doesn't seem to have a /run/lvm directory when I first create it.
Also, when booting from a live iso when using --raidintegrity
the lv is inactive. Normal raid1 and raid0 work fine.
vgchange os-vg --setautoactivation y
returns: Volume Group autoactivation is already yes
There is no files in the iso's /run/lvm but the directory exists
Left-over run files will affect the autoactivation of any VG (LV types shouldn't be relevant.) The "is complete" messages seem to indicate that incorrect temp files exist under /run/lvm/pvs_online/ and /run/lvm/vgs_online/.
Left-over run files could cause the VG to be autoactivated when the VG is still incomplete (some PVs aren't yet available). That's an unsolved problem, which is why autoactivation always waits for the VG to be complete. If you attempt to autoactivate an incomplete VG, and the VG has raid LVs, it means autoactivation may attempt to activate the raid LVs in degraded mode shortly before all PVs become available. This is part of what makes this an unsolved problem, and may explain some of the issues you're seeing.
So, to sort out this problem, you need to focus on the "PV ... online, VG ... complete" messages, and ensure that's happening correctly. Those are logged by the pvscan commands run from udev rules, and as mentioned earlier, they depend on /run/lvm/pvs_online and vgs_online temp files being cleared at each boot.
To debug the pvscans run by udev rules, you can enable udev debugging, and you can collect debug logging (-vvvv) from those specific pvscan commands.
as you can see in the ISO there isn't such left-over files
Since i'm using a iso there is no way it's persisting anything there. The root filesystem of my target installation is on the raid0 lvm /dev/os-vg/os-lv which doesn't mount.
And as I said if --raidintegrity
is removed or when using raid0 it works as intended
I'm not familiar with udev debugging, would this be enough to provide the necessary info?
echo "udev_log=\"debug\"" >> /etc/udev/udev.conf
sed -i -e 's/# verbose = 0/verbose = 7/' /etc/lvm/lvm.conf
And the log would be essentially journalctl | grep lvm
and pvscan -vvvv
?
bad-journalctl.log
bad-pvscan-vvvv.log
bad-vgchange-ay.log
\
\
Now without --raidintegrity
good-journalctl.log good-pvscan-vvvv.log good-vgchange-ay.log
I'll try to take a better look at these logs tomorrow
This doesn't look like a normal system boot, e.g. booting a standard RHEL install. If you're not doing that, then whatever it is you are doing is outside of what lvm is designed to do. In a normal RHEL install, the root LV is activated in the initrd (see lvm code in dracut). Then, it switches to the root fs, runs the coldplug service which generates new uevents for each of the disks, which triggers a command "pvscan --cache ..." (run from udev rules) for each PV. Those "pvscan --cache ..." commands create temp files under /run/lvm/pvs_online/ and /run/lvm/vgs_online/. Once all the PVs for the VG are online (based on the run files), the VG is autoactivated, which covers any LVs that were not already activated in the initrd. You can read more about it in https://man7.org/linux/man-pages/man7/lvmautoactivation.7.html
You're attaching standard pvscan and vgchange -ay commands that you've run. This is very different from the pvscan/vgchange commands that are run from udev rules, which are involved in a standard RHEL system boot.
Okay, once i tried troubleshooting it from initramfs it all made sense. First of all it isn't associated with /run/lvm
files neither with the race condition issue.
For the sake if someone lands here, the issue is debian doesn't load the module dm_integrity
by default, that's why only raids with --raidintegrity y
were failing:
After landing in initramfs: lvm vchange -ay
returns:
/sbin/modprobe failed: 1
Can't process LV os-vg/os-lv_rimage_0: integrity target support missing from kernel?
0 logical volume(s) in volume group "os-vg" now active
In my case the activation of the lvm integrity in the rootfs makes it necessary to load dm_integrity
into initramfs
echo "dm_integrity" >> /etc/initramfs-tools/modules
update-initramfs -u
What is kinda of not intuitive is that vgchange -ay
loads the module and the udev rules doesn't. Which isn't a problem in other distros where dm_integrity
is loaded by default it seems (at least in their iso).
Thanks for the help and patience @teigland
So closing the issue - if there is still a bug - it's likely a bug for initramfs / dracut tooling to properly add all dm modules into ramdisk.
lv not available after reboot,how to solve this problem???
After bring up, I can use “vgchange -ay vg0“ command to solve this problem manually,Is there any way to solve this problem automatically??
LVM version: LVM version: 2.03.10(2)-git (2020-03-26) Library version: 1.02.173-git (2020-03-26) Driver version: 4.35.0 Configuration: ./configure