opensvc / multipath-tools

Other
60 stars 48 forks source link

The multipathd.service startup failure #95

Closed huyubiao closed 2 months ago

huyubiao commented 2 months ago

Expected behaviour you didn't see

# systemctl enable multipathd.service
# reboot
# systemctl status multipathd.service
multipathd.service is runing

Unexpected behaviour you saw

# systemctl enable multipathd.service
# reboot
# systemctl status multipathd.service
multipathd.service is failed: multipathd[2070]: Cannot open pidfile [/var/run/multipathd.pid], error was [No such file or directory]

Steps to reproduce the problem

Analysis

If /var/run does not exist under /, but exists in /var. multipathd.service is started after / is mounted and before /var is mounted. And the multipathd.service service is started before the systemd-tmpfile-setup.service service is started, multipathd will fail.

I think that multipathd.service should be started after systemd-tmpfile-setup.service to ensure that the /var/run/ directory exists. For example: in multipathd.service:

[Unit]
Wants=xxx systemd-tmpfiles-setup.service
After=xxx systemd-tmpfiles-setup.service
mwilck commented 2 months ago

Your setup is highly unusual. On modern systems /run is mounted very early during initrd processing, and /var/run is a symlink to /run.

The choice of the runtime dir is automatically determined during build in Makefile.inc

runtimedir      := $(if $(shell test -L /var/run -o ! -d /var/run && echo 1),/run,/var/run)

Thus if (in the build system) /var/run is a symlink or doesn't exist at all, the compiled-in runtime dir is /run, not /var/run.

You should probably override runtimedir during build:

make runtimedir=/run
huyubiao commented 2 months ago

Your setup is highly unusual. On modern systems /run is mounted very early during initrd processing, and /var/run is a symlink to /run.

The choice of the runtime dir is automatically determined during build in Makefile.inc

runtimedir      := $(if $(shell test -L /var/run -o ! -d /var/run && echo 1),/run,/var/run)

Thus if (in the build system) /var/run is a symlink or doesn't exist at all, the compiled-in runtime dir is /run, not /var/run.

You should probably override runtimedir during build:

make runtimedir=/run

Yes,/run is mounted very early during initrd processing, but the /var/run soft link does not exist in the early root partition of the device. Sometimes it is not guaranteed that the build system will be the one I want to run. In other words, I might build on a build system that has /var/run, and then install and run on a system that does not have /var/run.

mwilck commented 2 months ago

Sometimes it is not guaranteed that the build system will be the one I want to run.

True. In this case the autodetection of system properties doesn't work, and you need to override the settings during build, as described above.

huyubiao commented 2 months ago

Sometimes it is not guaranteed that the build system will be the one I want to run.

True. In this case the autodetection of system properties doesn't work, and you need to override the settings during build, as described above.

I don't accept that too much. multipath/tmpfiles.conf.in is created after systemd-tmpfile-setup.service. In this case, if the directory in the file is used, the directory does not exist.

mwilck commented 2 months ago

multipath/tmpfiles.conf.in is created after systemd-tmpfile-setup.service. In this case, if the directory in the file is used, the directory does not exist.

True. Strangely, I've never seen an issue with that so far. Maybe it was just luck.

But #98 is not a correct solution, either, as it introduces an ordering cycle. Also, AFAIU, systemd-tmpfiles-setup.service doesn't care about /var/run at all.

mwilck commented 2 months ago

True. Strangely, I've never seen an issue with that so far. Maybe it was just luck.

The error you have been observing: Cannot open pidfile [/var/run/multipathd.pid] has nothing to do with systemd-tmpfiles-setup.service. It just requires runtimedir (aka /run) to be mounted, which is done by systemd very early, even before running generators.

mwilck commented 2 months ago

I just reviewed the code again. RUNTIME_DIR is used in 3 places:

  1. for the PID file. This just requires that RUNTIME_DIR aka /run exists. As mentioned, this is unrelated to systemd-tmpfile-setup.service. Note that the Pidfile is not in the /run/multipath dir which systemd-tmpfiles would create.
  2. for the failed_wwids data. multipathd will create /run/multipath/failed_wwids if necessary.
  3. for the find_multipaths data. multipathd will create /run/multipath/find_multipaths if necessary.

We can't create the directory RUNTIME_DIR itself. We must assume that the directory exist already; otherwise systemd may mount something over it and the Pidfile will not be visible any more.

multipath/tmpfiles.conf.in is created after systemd-tmpfile-setup.service

It is true that in many cases, systemd-tmpfiles-setup.service will try to create /run/multipath directory after multipathd.service has started and after multipath -u has been run from udev rules, in which case either multipath or multipathd may already have created it. Which is not a problem.

As I initially stated, the issues you have observed are just related to your use of runtimedir=/var/run at compile time.