opensvc / multipath-tools

Other
59 stars 47 forks source link

Udev event when all mpath slaves devices are added #89

Closed abhinavmalik31 closed 2 months ago

abhinavmalik31 commented 2 months ago

Hi Team, We are currently experimenting with iscsi volumes and multipath. For a single iscsi target, we have multiple iscsi sessions aggregated by multipath as a single device. We want to recognize, via Udev, once such mpath device is ready to be used. In our case, all slaves (per iscsi target) should be added to the mpath device when we call it ready.

We were banking upon a change event for device mapper, where mpath old status is empty (.MPATH_DEVICE_READY_OLD=), new status is 1 (MPATH_DEVICE_READY=1) and device shows as ready (MPATH_DEVICE_READY=1). However, seems like when this event is triggered only 1 slave is seen under /sys/block/dm-*/slaves/*.

UDEV [2000499.191055] change /devices/virtual/block/dm-0 (block) .MPATH_DEVICE_READY_OLD= ACTION=change DEVLINKS=/dev/disk/by-id/scsi-360003ff0430e5e1d8f7a9697e1e8e5b6 /dev/disk/by-id/dm-uuid-mpath-360003ff0430e5e1d8f7a9697e1e8e5b6 /dev/mapper/mpathl /dev/disk/by-id/wwn-0x60003ff0430e5e1d8f7a9697e1e8e5b6 /dev/disk/by-id/dm-name-mpathl DEVNAME=/dev/dm-0 DEVPATH=/devices/virtual/block/dm-0 DEVTYPE=disk DM_ACTIVATION=1 DM_COOKIE=6291456 DM_NAME=mpathl DM_SERIAL=360003ff0430e5e1d8f7a9697e1e8e5b6 DM_SUSPENDED=0 DM_TYPE=scsi DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1 DM_UDEV_PRIMARY_SOURCE_FLAG=1 DM_UDEV_RULES_VSN=2 DM_UUID=mpath-360003ff0430e5e1d8f7a9697e1e8e5b6 DM_WWN=0x60003ff0430e5e1d8f7a9697e1e8e5b6 ID_PART_TABLE_TYPE=gpt ID_PART_TABLE_UUID=b7c98439-3ee4-40d1-b548-70e3ff3b2133 MAJOR=253 MINOR=0 MPATH_DEVICE_READY=1 MPATH_SBIN_PATH=/sbin SEQNUM=73927 SUBSYSTEM=block TAGS=:systemd: USEC_INITIALIZED=2000495577798

There is another change event seen few seconds later, where mpath status doesn't change but reload flag is set (DM_SUBSYSTEM_UDEV_FLAG0=1). If we rely on this instead, we see all slaves device added at this point.

UDEV [2000510.538609] change /devices/virtual/block/dm-0 (block) .MPATH_DEVICE_READY_OLD=1 ACTION=change DEVLINKS=/dev/disk/by-id/dm-name-mpathl /dev/disk/by-id/scsi-360003ff0430e5e1d8f7a9697e1e8e5b6 /dev/disk/by-id/dm-uuid-mpath-360003ff0430e5e1d8f7a9697e1e8e5b6 /dev/disk/by-id/wwn-0x60003ff0430e5e1d8f7a9697e1e8e5b6 /dev/mapper/mpathl DEVNAME=/dev/dm-0 DEVPATH=/devices/virtual/block/dm-0 DEVTYPE=disk DM_ACTIVATION=0 DM_COOKIE=23068672 DM_NAME=mpathl DM_SERIAL=360003ff0430e5e1d8f7a9697e1e8e5b6 DM_SUBSYSTEM_UDEV_FLAG0=1 DM_SUSPENDED=0 DM_TYPE=scsi DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG=1 DM_UDEV_PRIMARY_SOURCE_FLAG=1 DM_UDEV_RULES_VSN=2 DM_UUID=mpath-360003ff0430e5e1d8f7a9697e1e8e5b6 DM_WWN=0x60003ff0430e5e1d8f7a9697e1e8e5b6 ID_PART_TABLE_TYPE=gpt ID_PART_TABLE_UUID=b7c98439-3ee4-40d1-b548-70e3ff3b2133 MAJOR=253 MINOR=0 MPATH_DEVICE_READY=1 MPATH_SBIN_PATH=/sbin MPATH_UNCHANGED=1 SEQNUM=73960 SUBSYSTEM=block TAGS=:systemd: USEC_INITIALIZED=2000495577798

In our case, a device needs to be added only once. If we just rely on the reload change event, as the point where device is ready to be added, we may see it again on another reload. We wanted to know if there is any particular flag/property or event that can be relied on for our case. We need all slaves devices to be added as they would be considered as separate block devices otherwise. One way could be acknowledge the first change event (when MPATH_DEVICE_READY=1) and then wait for the reload event. Is there any other better way to handle this?

ENV details multipath - 0.8.4-41.el8.x86_64 udev- 239-78.el8.x86_64 OS- Rocky Linux 8.9

mwilck commented 2 months ago

AFAICS, the behavior you describe is correct. A multipath device is ready to use as soon as just one path device is available (multipath isn't RAID).

In our case, a device needs to be added only once. If we just rely on the reload change event, as the point where device is ready to be added, we may see it again on another reload. We wanted to know if there is any particular flag/property or event that can be relied on for our case.

I fail to parse this. I suppose with "reload change event", you are referring to the first one above (with DM_ACTIVATION=1), and with "another reload" you mean the 2nd one (with DM_SUBSYSTEM_UDEV_FLAG0=1), but please don't make me guess, be concise and clear. More: What is "for our case"? What "case" are you referring to? What is it that you want to achieve?

Guessing again, I think you may want to delay activation of the multipath device until all path devices (iSCSI) are detected. multipathd doesn't support this. It's actually against the spirit of multipathing. As I said above, a multipath device is considered "up" as soon as at least one healthy path device exists.

You can try to come up with your own udev rules to implement a different logic for the respective udev properties (most importantly, SYSTEMD_READY), but it's out of scope for us, and I can only discourage it. You need to realize that if you do this, the device may be considered "down" as soon as a single path fails, which is obviously not what you'd desire in a multipath setup. Note also that there is no way for multipathd to know how many paths you have configured for a given device, IOW, how to detect whether "all" paths are available for a given map. We don't even have a configuration option for that.

We need all slaves devices to be added as they would be considered as separate block devices otherwise

That isn't true. Well, it is true that the iSCSI devices show up as separate devices, but they remain unused. The udev rules make sure that they have SYSTEMD_READY=0 set, and will thus not be activated (i.e. mounted). Only the multipath device is usable by systemd.

At least this is how it should work. It's of course possible that there's some bug in the way this is set up in Rocky Linux, or that you've made some configuration mistake. I'd advise adding udev.log-priority=debug and/or systemd.log_level=debug to the boot parameters and examine the boot messages meticulously. Hint: More often than not, problems arise because the configuration in the initial RAM disk is inconsistent with the configuration in the booted system.

In any case, this is something for the Rocky distribution maintainers to look at, not us upstream. Your multipath-tools version is 4 years old.

abhinavmalik31 commented 2 months ago

Hi @mwilck Thanks for the quick reply and making things more clear.

I fail to parse this. I suppose with "reload change event", you are referring to the first one above (with DM_ACTIVATION=1), and with "another reload" you mean the 2nd one (with DM_SUBSYSTEM_UDEV_FLAG0=1), but please don't make me guess, be concise and clear.

I was referring to DM_SUBSYSTEM_UDEV_FLAG0=1 as the first reload here and any further event (path loss/reinstantiation or something else) that would trigger "another reload". Not sure if the first one (DM_ACTIVATION=1) does a reload too. Just for our understanding what other events can lead to reload (i.e. DM_SUBSYSTEM_UDEV_FLAG0=1)?

More: What is "for our case"? What "case" are you referring to? What is it that you want to achieve?

Here's our scenario. We are dealing with a bunch of local (can be both physical or virtual HD) as well as remote disks (mounted via iscsi). Based on whether it is a local block device or remote, we have different workflows (platform specific) that gets triggered. Our logic of local block device was something under (/dev/sd). We don't want multipath slave devices to be considered under the local block disks. We were initially relying on _/sys/block/dm-*/slaves/\_ to give us all the slaves but, from your explanation, it is clear that it can get updated later as well (not when device is termed ready). So, essentially we either need a way to find the spot(udev event) where all slaves are seen in that path (which doesn't seem feasible by design) or we have a way to distinguish mpath slave from other local block devices.

You can try to come up with your own udev rules to implement a different logic for the respective udev properties (most importantly, SYSTEMD_READY), but it's out of scope for us, and I can only discourage it. You need to realize that if you do this, the device may be considered "down" as soon as a single path fails, which is obviously not what you'd desire in a multipath setup.

Right, we don't want to do that either.

That isn't true. Well, it is true that the iSCSI devices show up as separate devices, but they remain unused. The udev rules make sure that they have SYSTEMD_READY=0 set, and will thus not be activated (i.e. mounted). Only the multipath device is usable by systemd.

Thanks, I guess this is something we can rely on. Probably lookout for DM_MULTIPATH_DEVICE_PATH=1 or SYSTEMD_READY=0 by querying udevadm for all devices. Please let us know if there is a better way to do it (which doesn't involve multiple udevadm queries) or if this is incorrect way to look at it.

bmarzins commented 2 months ago

I was referring to DM_SUBSYSTEM_UDEV_FLAG0=1 as the first reload here and any further event (path loss/reinstantiation or something else) that would trigger "another reload". Not sure if the first one (DM_ACTIVATION=1) does a reload too. Just for our understanding what other events can lead to reload (i.e. DM_SUBSYSTEM_UDEV_FLAG0=1)?

Most reloads will set DM_SUBSYSTEM_UDEV_FLAG0=1. This will absolutely not guarantee that all the paths are present. Like Martin said, that is impossible for multipath itself to know, since it has no idea how many paths you are expecting. If you really need this, it is something that you will have to do outside of multipath, probably by running multipathd commands and parsing the output, but given your explanation for why you want this, I expect that you don't actually need it.

We need all slaves devices to be added as they would be considered as separate block devices otherwise

That isn't true. Well, it is true that the iSCSI devices show up as separate devices, but they remain unused. The udev rules make sure that they have SYSTEMD_READY=0 set, and will thus not be activated (i.e. mounted). Only the multipath device is usable by systemd.

Thanks, I guess this is something we can rely on. Probably lookout for DM_MULTIPATH_DEVICE_PATH=1 or SYSTEMD_READY=0 by querying udevadm for all devices. Please let us know if there is a better way to do it (which doesn't involve multiple udevadm queries) or if this is incorrect way to look at it.

Again, like Martin said, you will always see the slaves as separate devices. This is true regardless of whether they are part of a multipath device or not, as multipath needs these device nodes present to work correctly. The 68-del-part-nodes.rules udev rules file will delete the partition device nodes for these slave devices. This isn't necessary for things to work correctly. It's just to keep the device list tidy, and to keep users from accidentally using the wrong devices. Even if the partition devices are left alone, they will still be marked as SYSTEMD_READY=0 and ignored by systemd.

I should point out that you usually don't need to wait for multipath to do anything for the path devices to get marked with SYSTEMD_READY=0. This happens when the devices are initially added to the system. If multipath will attempt to use the device as a path, it will get marked with SYSTEMD_READY=0 as soon as it's discovered, before multipathd actually gets around to adding the path to the device.

The one time when this may not be true is the first time a new multipath device, with a never seen before WWID, is created. In this case, depending on the "find_multipaths" setting in /etc/multipath.conf, there may be no way to know at the time these devices first appear that they will become part of a multipath device. In this case, when the multipath device is first created, all the path devices will get marked with SYSTEMD_READY=0. Any path devices which appear after the device is created will of course get immediately marked with SYSTEMD_READY=0, and on future boots, all devices will get marked immediately.

There is a some additional work necessary if you need your multipath devices to appear in the initramfs. Since the WWIDs that multipath has seen before are stored in the /etc/multipath/wwids file, and that file in the initramfs will only have the WWIDs that were present when the initramfs image was created, any multipath devices that were created for the first time after the initramfs image was created will always appear as first-time devices when you boot, and their path devices will not be immediately be marked with SYSTEMD_READY=0 on discovery. This is rarely causes people issues, but if it is a problem for you, then you simply need to remake your initramfs, and it will pull in the current /etc/multipath/wwids file with all the known multipath WWIDs. Then the path devices will immediately get marked with SYSTEMD_READY=0 when they appear on future boots.

abhinavmalik31 commented 2 months ago

Thank you both. This is really helpful. We will figure out how to incorporate it in our use case.