opensvc / multipath-tools

Other
59 stars 47 forks source link

multipathd and RAID concurency. #67

Closed mtkaczyk closed 1 year ago

mtkaczyk commented 1 year ago

Hello, I see the issue between multipathd and MD raid management. I have SLES 15 SP5 system configured with VROC RAID0 and native RAID0:

 # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
md127 : inactive nvme2n1[1](S) nvme0n1[0](S)
      2210 blocks super external:imsm

md126 : active raid0 nvme3n1[0]
      976630272 blocks super 1.2 512k chunks

md125 : active raid0 nvme2n1[1] nvme0n1[0]
      1953513472 blocks super external:/md127/0 128k chunks

unused devices: <none>

I determined that enabling multipathd causes that my arrays are not started after reboot:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md125 : inactive nvme3n1[0]
      976630488 blocks super 1.2

md126 : inactive nvme0n1[1] nvme2n1[0]
      1953513472 blocks super external:/md127/0

md127 : inactive nvme2n1[1](S) nvme0n1[0](S)
      10402 blocks super external:imsm

It seems to be caused by concurrency between mdadm and this daemon on startup, the daemon doesn't respect metadata on the drives.

Obviously, I'm able to fix the issue by blacklisting devices in multipath config. Another workaround is to force MD modules loading to initrd image, so the raid will be started earlier. The main problem is if multipathd is enabled during the installation then after reboot to new OS raids are broken. The tool policy is aggressive because by default it claims every nvme device.

Can we do something to change this behavior? Here some ideas:

Thanks, Mariusz

mtkaczyk commented 1 year ago

@mwilck I can see you in recent contributors so I'm notifying you directly - SLES is affected, could you take a look?

mwilck commented 1 year ago

@mtkaczyk, Please open a SUSE bug.

mwilck commented 1 year ago

The tool policy is aggressive because by default it claims every nvme device.

It's true, we use this policy under SUSE. Other distributions are probably not affected. The solution is indeed to use blacklisting, and set find_multipaths option to yes or smart.

In your case I'd recommend to activate multipath after installation, which should "just work".

mwilck commented 1 year ago

This is related to SUSE only, not an upstream issue.