troglobit / finit

Fast init for Linux. Cookies included
https://troglobit.com/projects/finit/
MIT License
634 stars 63 forks source link

wishlist: support for mdev's netlink daemon mode #367

Closed az143 closed 1 year ago

az143 commented 1 year ago

since busybox release 1.31 (june 2019), mdev has been a much better udev-replacement and now does pretty much all that mdevd does:

starting it up as mdev -d makes it daemonise and listen on netlink, whereas mdev -df does the same without forking. with those versions, no scan run with mdev -s is required (or useful). mdev -d doesn't have a pid file, though.

i think that your new(ish) system/hotplug.conf could do with some small amendments to support such newer mdev capabilities, unless you think four years is too short a transition time for relying on new enough busyboxes.

troglobit commented 1 year ago

Since the current setup has been moved from hard-coded in C to an external .conf file the mdev support is considered "working" and "good enough" for users to be able to override locally. Improvements along the lines you mention are welcome in the form of pull requests. I.e., I have not immediate use for this myself so would be put on the back burner.

az143 commented 1 year ago

On Wed, 20 Sep 2023 22:22:41 -0700, Joachim Wiberg writes:

Improvements along the lines you mention are welc ome in the form of pull requests.

i'll provide one; may take a week or two.

regards az

-- Alexander Zangerl + GPG Key 2FCCF66BB963BD5F + https://snafu.priv.at/ The determined programmer can write a FORTRAN program in any language.

troglobit commented 1 year ago

On Wed, 20 Sep 2023 22:22:41 -0700, Joachim Wiberg writes: Improvements along the lines you mention are welc ome in the form of pull requests. i'll provide one; may take a week or two. regards az

I'd appreciate if you can keep the old mdev -s at least commented out since I know of a few customers that still have a very old (patched up) BusyBox, which does not have the mdev.

troglobit commented 1 year ago

Very little progress on this one, so I took the liberty of testing it out myself. I changed system/10-hotplug.conf to:

run     nowarn conflict:udevd cgroup.init :scan [S] mdev -s -- Populating device tree
service nowarn conflict:udevd cgroup.init <pid/syslogd> \
    [S12345789] mdev -df -S -- Device event managing daemon (mdev)

In my OS I have either BusysBox syslogd or my own sysklogd (the one Debian used before rsyslogd). Both provide the condition <pid/syslogd> so I can block the start of mdev -df until syslogd has started up properly.

This works for me:tm:, but maybe you had something else in mind @az143? Unless you have any objections, I'll go with this tomorrow, because I'd like to get Finit v4.5 out the door asap since I have customer projects hinging on it.

az143 commented 1 year ago

sorry for the lack of progress - i'm overseas at the moment and slightly busier than normal. back on topic:

your suggested setup is near ideal, except:

  1. when you use mdev -d, then the mdev -s task/run is superfluous because daemon mode (foreground or background) already includes an initial scan like mdev -s.

  2. to perform a full coldplug run (ala udevadm trigger -c add), i'd suggest that the mdev daemon start is followed up with an equivalent coldplug trigger:

    #!/bin/busybox ash
    # there are some not-even-overly-exotic devices that have spaces in their paths under /sys 
    find /sys/devices -name uevent -print | while IFS='' read X; do echo add > $X; done
  3. however, a typical "fully featured" mdev config does include $MODALIAS-based module loading, which to a great extent duplicates the modprobe functionality included in finit (src/modprobe.c) - in particular if you use a full coldplug as outlined in point 2. while running the modprobe sequence more than once doesn't actively hurt anything, it's a bit wasteful - but i haven't found a clean way to deduplicate that.

  4. mdev doesn't produce a pid file. personally i've been using finit's pid file generator setup, ie.

    service [S12345] pid:/var/run/mdev.pid mdev -df -S -- mdev daemon

overall: for most setups and/or basic use, mdev -df -S without the mdev -s will do the job sufficiently well.

troglobit commented 1 year ago

Thanks, I suspected mdev -s wasn't really required, however, "sufficiently well" is unfortunately not good enough here. There is a class of services that we must ensure run after modules have been probed, firmwares have been loaded, and devices created. This includes, but is not limited to, tasks and services that rely on network interfaces being available.

As an example, we use a Marvell switch with external PHYs that requires firmware. We use udevd in our OS and have set up syslogd as a barrier for all other services (after udevd + udavadm has run). The syslogd service waits for <run/udevadm:5/success> and thus blocks everything until that task returns. This ensures we see all physical interfaces in the system at that point.

We need a similar barrier here with mdev, that's why I added mdev -s to ensure all that had completed, and then mdev -df would take care of any runtime changes after that point. I'm not comfortable using pid:/var/run/mdev.pid since that is created by Finit when it has successfully forked off mdev -df, not when mdev -df is done probing modules, loading firmware, and creating device nodes.

I have looked at other distros using mdev -df and have not seen anything like your find /sys/devices construct. Do you have a reference for that I can look into?

Yes, I'm aware there's an overlay between the modprobe plugin and udevd/mdev. The plugin can be disabled at configure time and that is what we do in our setup, but that's up to each user.

az143 commented 1 year ago

On Sun, 15 Oct 2023 11:42:54 -0700, Joachim Wiberg writes:

We need a similar barrier here with mdev, that's why I added mdev -s to ensur e all that had completed, and then mdev -df would take care of any runtime ch anges after that point.

please note that, according to the busybox sources, mdev -s is not the same as an udevadm trigger...add:

util-linux/mdev.c: /* "mdev -s" scans /sys/class/xxx, looking for directories which have dev

the find-and-add coldplug is closer to a full discovery operation, but doesn't wait for completion. (mdev has no mechanism similar to udevadm settle.)

I have looked at other distros using mdev -df and have not seen anything like your find /sys/devices construct. Do you have a reference for that I can lo ok into?

a few: https://documentation.suse.com/sles/12-SP4/html/SLES-all/cha-udev.html#sec-udev-boot explains how and why add to the udev device is relevant/required,

daemon_init() in util-linux/mdev.c in the busybox sources, has an example of the find-and-add sequence

the manpage and sources for mdevd-coldplug (https://skarnet.org/software/mdevd/mdevd-coldplug.html and https://github.com/skarnet/mdevd/blob/master/src/mdevd/mdevd-coldplug.c) describe why (and how) the competition does the equivalent,

and this lwn article, https://lwn.net/Articles/646617/

the read without IFS is just to survive devices with spaces in their uevent name, e.g. /sys/devices/platform/Fixed MDIO bus.0/mdio_bus/fixed-0/uevent and to avoid the repeated fork+exec of the find ... -exec variation.

regards az

-- Alexander Zangerl + GPG Key 2FCCF66BB963BD5F + https://snafu.priv.at/ The correct way to roll NT out is out the door and into the nearest dumpster or other large waste receptacle. -- Mike Andrews

troglobit commented 1 year ago

Thanks, I'll have a look in detail later tonight (CET)!

troglobit commented 1 year ago

OK, I've spent a few hours skimming through the links you sent and reading the mdev source.

  1. You're right, mdev -s is run as the first stage of mdev -d. It is called right before mdev -d forks off to the background (unless asked to run in foreground with -df)
  2. No /run/mdev.pid is created, this would have solved a lot of problems for us since it's the main synchronization mechanism in Finit -- an excellent placement of creating that PID file would have been right after forking, or after having internally run mdev -s
  3. We need some way of echo add > /sys/devices/**/uevent

This mdev -df one is hairy. We really have no way of knowing when it has completed the first stage (mdev -s), unless we run it as a forking process, but then we have no way of finding its PID (in the pidfile.so plugin), and it does not send any other synchronization.

The code has this comment:

        /*
         * Make inital scan after the uevent socket is alive and
         * _before_ we fork away. Already open mdev.log because we work
         * in daemon mode.
         */

Clearly, they have designed mdev -d to run well in SysV/BusyBox init. I.e., from a script at bootstrap where the "synchronization" is that execution returns to the script after the "initial scan" (mdev -s) has completed.

We could run it as mdev -d and then go dumpster diving in /proc/*/stat for the new PID, but that's just asking for trouble since we don't know how many other mdev processes the user may have started (or if mdev starts forking off children in the future), or if we just find the parent mdev process right before it is reaped by Finit ...

... I honestly don't see a way around this that would be sustainable to maintain in Finit. So my suggestion is to keep things as-is in system/10-hotplug.conf (mdev -s) and add a huge comment around that run stanza explaining the issue(s). We can also provide commented-out mdev -df and your example coldplug add. The .conf file is installed to /lib/finit/system/10-hotplug.conf and is possible to override by the user in /etc/finit.d/10-hotplug.conf, which I think most users will want to do anyway.

troglobit commented 1 year ago

Here's a crazy idea, completely sidestepping BusyBox mdev -df.

For a future relase, fork mdevd and integrate in Finit as an optional helper daemon. We could tailor the boot process more tightly around it ... the licenses are compatible and the code is not that big.

Should be fairly straightforward and as I'm writing this I'm realizing this is how systemd swallowed up udevd :rofl:

Regardless, still is a better idea than what we can do with mdev -df, and we would still need the extra band-aid with echo add > /sys/devices/**/uevent.

az143 commented 1 year ago

On Mon, 16 Oct 2023 11:37:07 -0700, Joachim Wiberg writes:

  1. No /run/mdev.pid is created, this would have solved a lot of problems for ...
  2. We need some way of echo add > /sys/devices/**/uevent

personally i'm telling finit to generate a pidfile for mdev -d, followed by a run stanza which depends on that mdev pidfile, which then performs the echo-add-sequence.

the echo-add operation can proceed as soon as mdevd listens, and mdev -d sets up the socket (with seriously fat receive buffer) before doing the initial scan: even if that scan were to take a while, the big socket buffer ensures that no add uevents are lost until the initial scan is done and mdev starts processing the uevents.

(the very few activities between entering mdev_main() and the socket setup don't have any potential for undue delay.)

therefore i think that the pidfile timing versus readiness for uevent processing is not an issue.

but i do agree that lack of a discovery completion notification may not be sufficiently robust for some (many?) situations.

So my suggestion is to keep things as-is in system/10-hotplug.conf (mdev -s) and add a huge comment around that run stanza explaining the issue(s). We can also provide commented-out mdev -df and your example coldplug >add.

i think that's sensible.

here is my suggestion for an commented mdev config stanza (but not disabled, and not conditional either):

on startup, the mdev daemon only performs a cursory scan of

already known devices and their device files, then sets itself up

as a manager for future uevents.

#

mdev doesn't create a pid file; this stanza instructs finit to generate one

service [S12345] pid:/run/mdev.pid mdev -df -- mdev daemon

to perform device (re)discovery, module loading and setup, a coldplug

operation is typically required.

#

however, mdev does not offer any mechanism to detect when

the discovery operation is complete: on slower systems the triggering

side (the coldplug script) often completes quite a bit earlier than

mdev's uevent processing.

in other words: depending on <run/coldplug/success> is not necessarily

an indicator of all devices having been (re)discovered and are now fully set up.

run [S] <service/mdev/ready> /sbin/coldplug -- mdev cold plug discovery run

and my /sbin/coldplug is just (which may or may not be safely inlineable in a finit conf)

!/bin/busybox ash

find /sys/devices -name uevent -print | while IFS='' read X; do echo add > $X; done

-- Alexander Zangerl + GPG Key 2FCCF66BB963BD5F + https://snafu.priv.at/ The idea is that RAID stands for Redundant Array of Inexpensive Dreck. -- Anthony de Boer

az143 commented 1 year ago

On Mon, 16 Oct 2023 12:37:56 -0700, Joachim Wiberg writes:

For a future relase, fork mdevd and integrate in Finit as an optional helper daemon. We could tailor the boot process more tightly around it ... the licenses are compatible and the code is not that big.

i hear you :-)

and we would still need the extra band-aid with echo add > /sys/devices/**/uevent.

not really: mdevd comes with mdevd-coldplug, and yesterday i saw that they both have a synchronous/waiting mode where mdevd-coldplug doesn't exit until mdevd is done; which would make the perfect barrier for ensuring the discovery is complete.

the one downside of mdevd that i see (from an integration/packaging perspective) is that it also requires skalibs as build- and run-dependency - or one would have to recreate/reimplement missing bits using libite/libuev. the former duplicates code, the latter duplicates work.

for me personally finit's main appeal is its streamlined behaviour and compactness, and thus i'm not fully convinced that finit itself would benefit greatly from encompassing mdevd; but to support it as an option in 10-hotplug would definitely be a good thing.

regards az

-- Alexander Zangerl + GPG Key 2FCCF66BB963BD5F + https://snafu.priv.at/ Syntactic sugar causes cancer of the semicolon. -- Alan Perlis

troglobit commented 1 year ago

personally i'm telling finit to generate a pidfile for mdev -d, followed by a run stanza which depends on that mdev pidfile, which then performs the echo-add-sequence. the echo-add operation can proceed as soon as mdevd listens, and mdev -d sets up the socket (with seriously fat receive buffer) before doing the initial scan: even if that scan were to take a while, the big socket buffer ensures that no add uevents are lost until the initial scan is done and mdev starts processing the uevents. (the very few activities between entering mdev_main() and the socket setup don't have any potential for undue delay.) therefore i think that the pidfile timing versus readiness for uevent processing is not an issue.

I'm not so sure it's a non-issue. I've tested on really slow embedded devices and I've seen "this shouldn't happen" too many times to let this one slide. Therefore I'll propose keeping mdev -s (below) in this release and do something better for the next.

but i do agree that lack of a discovery completion notification may not be sufficiently robust for some (many?) situations.

Yeah, without measuring/testing this I'd rather take the conservative route. In particular since I want to get this release out the door asap

So my suggestion is to keep things as-is in system/10-hotplug.conf (mdev -s) and add a huge comment around that run stanza explaining the issue(s). We can also provide commented-out mdev -df and your example coldplug >add. i think that's sensible. here is my suggestion for an commented mdev config stanza (but not disabled, and not conditional either):


# on startup, the mdev daemon only performs a cursory scan of
# already known devices and their device files, then sets itself up
# as a manager for future uevents.
#
# mdev doesn't create a pid file; this stanza instructs finit to generate one
service [S12345] pid:/run/mdev.pid mdev -df -- mdev daemon

Almost the same as my proposal, which I unfortunately have on my other computer 30 km away. Anyway, for this release I suggest we keep the mdev -s and add your service line below that. That way, other services that relied on the old way of just probing for device nodes would continue working, and any synchronization that depended on <run/mdev/success> would still work. I.e., like this:

# If udevd is not available, try mdev
run nowarn conflict:udevd \
        [S] mdev -s -- Populating device tree

# blablabla (see above) and also mention modprobe.so plugin would not be needed anymore
service nowarn conflict:udevd <run/mdev/success> \
        [S12345789] pid:/run/mdev.pid mdev -df -- Device event daemon (mdev)

It runs the device probe twice, but leaves a guarantee that at least that which worked before still work now.

[snip]

run [S] <service/mdev/ready> /sbin/coldplug -- mdev cold plug discovery run

and my /sbin/coldplug is just (which may or may not be safely inlineable in a finit conf):

#!/bin/busybox ash
find /sys/devices -name uevent -print | while IFS='' read X; do echo add > $X; done

This is great, we can put that script in /libexec/finit/coldplug along with other helper programs like logit and keventd.

I'll add it like that, adding only the conflict:udevd like the others so it doesn't run for the udevd case.

troglobit commented 1 year ago

On Mon, 16 Oct 2023 12:37:56 -0700, Joachim Wiberg writes:

For a future relase, fork mdevd and integrate in Finit as an optional helper daemon. We could tailor the boot process more tightly around it ... the licenses are compatible and the code is not that big. i hear you :-) and we would still need the extra band-aid with echo add > /sys/devices/**/uevent. not really: mdevd comes with mdevd-coldplug,

Riight, almost forgot about that! I should know, I helped integrate and test a preview of mdevd into Finit a year ago. It actually spawned the addition of both systemd and s6-style process synchronization!

and yesterday i saw that they both have a synchronous/waiting mode where mdevd-coldplug doesn't exit until mdevd is done; which would make the perfect barrier for ensuring the discovery is complete.

It works for some use-cases I couldn't get it to work properly with Finit without some additional features:

service [S12345789] notify:s6 mdevd -O 4 -D %n
task [S] <service/mdevd/ready> @root:root mdevd-coldplug

It works, but it ain't pretty.

the one downside of mdevd that i see (from an integration/packaging perspective) is that it also requires skalibs as build- and run-dependency - or one would have to recreate/reimplement missing bits using libite/libuev. the former duplicates code, the latter duplicates work. for me personally finit's main appeal is its streamlined behaviour and compactness, and thus i'm not fully convinced that finit itself would benefit greatly from encompassing mdevd; but to support it as an option in 10-hotplug would definitely be a good thing.

I've closely monitored s6, skalibs, and now latest mdevd for several years. For obvious reasons, s6 is a close competitor of Finit, but also because Laurent Bercot is a very nice guy. His code, however is very tight and in many cases hard to read. So I imagine integrating mdevd in Finit would mean stripping it down to its very core and cleaning it to better integrate with the utility functions in Finit and libite, I would not use skalibs for that. Effectively to the point of it no longer resembling the original, so it would not be called finit-mdevd or anything, more like finit-ueventd. I actually don't anticipate much problem with that, refactor it's not that much code, granted there will be need for testing of course.

troglobit commented 1 year ago

Ended up skipping mdev -s after a lot of testing.