lvmteam / lvm2

Mirror of upstream LVM2 repository
https://gitlab.com/lvmteam/lvm2
GNU General Public License v2.0
128 stars 72 forks source link

LVMLOCK\SanLock + Kernel Panic #156

Closed dylanetaft closed 6 days ago

dylanetaft commented 6 days ago

So...I found a way to trigger a kernel panic. It wasn't intentional.

The process to reproduce is as follows Set up two ISCSI targets with LIO\targetcli on a remote server, set up 4 luns in total, one for a volume group to hold a global lock for sanlock, another for a VG to hold other things - in my case KVM VMs.

I was thinking of mirroring what IBM has done with Power systems, virtual IO servers, and taking two ISCSI targets from two remote systems and using LVM to mirror them into a single filesystem.

Upon reviewing lvmlockd's man page and ultimiately some of the code - sanlock and lvmlock will only use the first PV to store the global lock. It creates a hidden LV that cannot be mirrored.

So I used MDADM to raid-1 the two ISCSI luns together and then put my global lock VG ontop of that.

I can reboot either iscsi remote target server, and I do not lose the global lock, it appears stable and recoverable.

It is not clear if exclusive locks on LVs work the same way - so for that I simply mirrored my other two luns together, did vgchange --lockstart on the VG, and vgchange -ae on the LV.

I booted up some VMs in KVM on the host.

Then I proceeded to reboot one of the iscsi target servers.

I did not lose the global lock as before, mdadm protected it.

The system kernel panicked while doing some vgdisplay commands, it seems maybe shared locks on LVs work in a similar way to the global lock?

So maybe this is more of a documentation issue - the man page of lvmlock should probably state that for both the global locks and other LVs, mirroring just doesn't work.

Everything seems fine if I put mdadm underneath both volume groups. It absorbs the loss of disk fine and no locks get lost, no VMs go down, no kernel panic.

I get that sanlock is probably for a true san - which has proper MPIO and has dual storage controllers.

Someone could be tempted however to chain this stuff together in production as like a software defined storage solution - with newer devices like NVMEs which don't work like trays of disks on dual SAS controllers. And it APPEARS to work until you really start testing failure situations. It DOES work if you put MDADM under the whole thing so lvmlock never sees PVs disappearing.

Are kernel panics and subsequent system reboots a bug? Or is this a documentation issue? Sanlock and lvmlock are for a true SAN, don't try to LVM mirror devices and expect redundancy for locks or system stability?

teigland commented 6 days ago

The sanlock disk paxos algorithm requires all machines to be reading/writing the same disk blocks at once. If those are mirrored underneath, we can't guarantee that machines all see the same thing when racing to do i/o on a single sector. If you lose the disk holding the hidden lvmlock LV (holding the sanlock leases), then you need to take the VG offline, rebuild the lvmlock leases, and bring the VG back online (details in lvmlockd man page under "Recover from lost PV holding sanlock locks".) With this approach, you can still add multiple PVs to the VG and create raid LVs for normal use.

The sanlock man page briefly mentions the issue of host based mirroring: "Using sanlock on shared block devices that do host based mirroring or replication is not likely to work correctly."

As for md raid, that does not work correctly from multiple hosts concurrently, apart from md-cluster which uses the dlm. So, an alternative approach to using shared VGs is to set up a corosync and dlm cluster.

dylanetaft commented 6 days ago

That'll work. I saw a few other folks try to do this online so at least there's a good answer. Thanks for your time!