Closed llebout closed 2 years ago
Should you follow systemd's advice here?
At the moment, we can't. Waiting for udev settle was introduced for strong reasons. Changing that is on my todo list, but it's a major effort, because we need to change the way multipathd discovers and tracks devices in fundamental ways.
The problem is that when we leave the initrd and enter the root FS, device-mapper devices persist but low-level devices such as SCSI disks do not. They first have to be re-discovered by coldplug (`systemd-udev-trigger.service"). When multipathd starts before "udev settle" is finished, it will encounter multipath maps referencing devices that apparently don't exist. multipathd will assume that these maps are invalid, and will try to tear them down, with possibly fatal consequences for the system. It's possible to change the behavior of multipathd by using sysfs-based device detection, but experience any device-detection-related change tends to have and partly unforeseen side effects. This needs careful engineering and even more careful testing, and will take time.
I'm not sure what this means in your log:
Feb 06 14:22:23 talos systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
At first look I thought udevadm settle
wouldn't work at all any more. But AFAICS that's not (yet) the case, and this failure is unrelated to the depreciation warning. Or am I overlooking something? What systemd version are you using?
Anyway, I suppose the systemd people are serious about this, and we need to tackle the basic issue rather sooner than later.
@mwilck The error is caused by some hardware/firmware bug of mine I think. Unrelated. Basically I run a Talos II machine and right now some PCI-e device is locked in an unusual way, lspci hangs and that device is not seen by the Linux kernel at all. When I give my machine a full reboot it goes away.
Thanks for clarifying that.
A patchset for removing this dependency has been posted here: https://listman.redhat.com/archives/dm-devel/2021-October/msg00321.html
Should you follow systemd's advice here?