networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
2.11k stars 352 forks source link

nut-driver-enumerator generates systemd service that pulls deprecated systemd-udev-settle #2638

Open yuv420p10le opened 2 months ago

yuv420p10le commented 2 months ago

Using Arch Linux, followed their wiki's guide for setting NUT: https://wiki.archlinux.org/title/Network_UPS_Tools

Eaton 5E 2000i USB. I applied the usbhid quirks kernel parameter.

After enabling the NUT enumerator to automatically generate, enable and start the UPS systemd service, my boot times have significantly slowed down:

❯  ~ systemd-analyze critical-chain     
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @17.814s
└─multi-user.target @17.814s
  └─nut.target @15.512s
    └─nut-monitor.service @15.442s +69ms
      └─nut-server.service @15.346s +92ms
        └─nut-driver.target @15.344s
          └─nut-driver@nutdev-usb1.service @11.243s +4.100s
            └─basic.target @11.241s
              └─dbus-broker.service @11.216s +23ms
                └─dbus.socket @11.201s
                  └─sysinit.target @11.194s
                    └─systemd-udev-settle.service @364ms +10.829s
                      └─systemd-udev-trigger.service @309ms +53ms
                        └─systemd-udevd-kernel.socket @294ms
                          └─system.slice @257ms
                            └─-.slice @257ms

My journal shows:

Sep 24 09:25:19 yuv420p10le systemd[1]: Starting Wait for udev To Complete Device Initialization...
Sep 24 09:25:19 yuv420p10le udevadm[633]: systemd-udev-settle.service is deprecated. Please fix nut-driver@nutdev-usb1.service not to pull it in.

This in the enumerator script seems to be causing that: https://github.com/networkupstools/nut/blob/7dbf605e82ccc89a1cf0aebf2e174dcadf5b0a93/scripts/upsdrvsvcctl/nut-driver-enumerator.sh.in#L112

jimklimov commented 2 months ago

Thanks! Looking at it, specifically at https://www.freedesktop.org/software/systemd/man/latest/systemd-udev-settle.service.html - I am not sure if there is an easy way forward. IIRC, the idea behind that dependency was to have NUT (USB-capable) driver programs start only after the OS has some certainty about available hardware (and supposedly assigns udev-based permissions), and this unit seems to be just about that.

Can you try commenting away that dependency in your deployment and see if it works faster? I suppose in the field, if something is not yet discovered/assigned properly, a driver start-up attempt would fail and another (resuscitation) would succeed when everything settles on its own.

In the graph above, systemd-udev-settle.service is not close to nut-driver@* so maybe something else asks for it too?

yuv420p10le commented 2 months ago

Can you try commenting away that dependency in your deployment and see if it works faster? I suppose in the field, if something is not yet discovered/assigned properly, a driver start-up attempt would fail and another (resuscitation) would succeed when everything settles on its own.

Boot is indeed almost instant when I get rid of the system-udev-settle dependency. However I get this error in the journal from nut-driver:

libusb1: Could not open any HID devices: insufficient permissions on everything

Starting the service by hand afterwards does work. Perhaps there might be an issue with the Eaton 5E specifically which causes it to be loaded slowly?

In the graph above, systemd-udev-settle.service is not close to nut-driver@* so maybe something else asks for it too?

Doesn't seem to be the case; I grepped /etc/ for udev-settle and the file generated by nut-driver-enumerator is the only one containing it

Regardless, I'm not too upset about the slower boot time. I don't reboot that often (unless troubleshooting anything else) and I'd rather have my components not get fried in case of some power outage :) The use of the deprecated system-udev-settle service might be worth looking at, however. I'm not versatile enough with Linux yet to suggest a solution unfortunately

jimklimov commented 2 months ago

One more idea to try (probably not a good one for "production", rather to see if there are visible differences):

Keep the systemd-udev-settle.service in nut-driver@* as generated, but change the nut.target from [Install] // WantedBy=multi-user.target to add a [Unit] // Wants=multi-user.target (may be an additional line).

And systemctl daemon-reload ; systemctl disable nut.target ; systemctl enable nut.target to recreate the symlinks.

The idea with this experiment is to delay NUT start-up until after the interactive system milestone is achieved (assuming it starts well at all). It has been a while, I am not sure which exact criteria systemd has to start units and targets beside making them wanted/required by common system units, so as to have a reason for NUT to start at all. I mean, manual systemctl enable probably does achieve that, but for systemd-presets configuration IIRC an explicit chain was needed.

Building on this experiment, if we stop at a half-way point, only commenting away the WantedBy but not adding the Wants line, maybe it would achieve the best of both worlds at least with the explicit systemctl enable nut.target - NUT target would start After=local-fs.target is achieved, essentially, and cause the start-up of other NUT units, among which the USB-capable driver would start after udev is ready and settled, and would not be a blocker for the interactive system to start.

Overall, the goal is to have system intelligent power protection starting as early and running as long as possible.

However I get this error in the journal from nut-driver: libusb1: Could not open any HID devices: insufficient permissions on everything Starting the service by hand afterwards does work.

More udev- and device-centric thoughts below. A specific odd item for me here is that I would expect the driver unit to Restart=always indefinitely while it is enabled, trying each RestartSec=15s, so even if it failed to start initially - it should have picked up a bit later on.

I suppose your "starting by hand" took place after more than that time since the initial failure to start (needing time for login, diagnosis and all that)?

Is there anything interesting in journalctl -lu nut-driver@nutdev-usb1.service, or generally in journalctl -xl (system-wide log, must be requested as root) to see the progression of events and decisions systemd made about (re-)starting (or not) certain units - e.g. failed other prerequisites and dependencies? Theoretising here, if perhaps the NUT driver (via libusb) poking into devices could upset initialization of udev, and so a restart attempt was deemed not feasible, or something (I guess should have propagated to try and restart all failed deps, but I digress).

Also, systemctl show nut-driver@nutdev-usb1.service (when the OS booted and failed to start the driver) would expose dozens or hundreds of lines from what the systemd daemon thinks about the unit. I have no particular idea what to look for there, but maybe something would seem relevant to the failure.

Perhaps there might be an issue with the Eaton 5E specifically which causes it to be loaded slowly?

Could be; however the documentation linked above suggests that "udev settling" waits for all hardware detection, not just the one item you might be interested in, and timing depends on the performance of the weakest link in the chain. Part of why it is not recommended for use, I guess.

You can add debugging verbosity to the driver unit, see https://github.com/networkupstools/nut/wiki/Changing-NUT-daemon-debug-verbosity - to check which devices it sees during reboot, if the Eaton USB IDs (likely 0x0463:*) are there. This way, maybe we can discern if it is a problem of device not yet being seen by the OS (e.g. controller booting after USB signal, dunno), vs. udev not yet having assigned permissions needed for the unprivileged NUT driver process to access the device node.

Alternatively, a user=root in ups.conf section for the device could effectively bypass concerns about those permissions, at the risk/cost of running a privileged daemon.