troglobit / finit

Fast init for Linux. Cookies included
https://troglobit.com/projects/finit/
MIT License
632 stars 63 forks source link

mdevd coldplug twice #328

Closed hongkongkiwi closed 1 year ago

hongkongkiwi commented 1 year ago

I notice that for mdevd, you start with the -C option, which performs a coldplug every time the service is started or restarted.

Should we remove this since we also coldplug here? https://github.com/troglobit/finit/blob/c5bca5f80aa50506a18ccbb065b95173c5459319/plugins/hotplug.c#L47

If we do that, I'm wondering if instead of running the mdevd-coldplug directly we could instead add a task to finit which runs when mdevd is ready? I'm not sure it makes any difference but I wonder if since we have mdevd with s6 readyness that could make sense.

Is there a reason to run this directly rather than a add ask a task (and move on)?

troglobit commented 1 year ago

To be honest, I don't know that much about mdevd. You're the expert here :-)

Nevertheless, following your reason you make a good point. Now that we have notification, we know when mdevd is ready and we should only coldplug once.

Maybe we could even take it one step further, drop this plugin entirely in favor of a .conf recommendation even?

# mdevd.conf
service [S12345789] cgroup.system notify:s6 @root:root mdevd -C -O 4 -D %n -- MDEVD Extended Hotplug Daemon
task [S] <service/mdevd/ready> @root:root mdevd-coldplug -- Replaying hotplug events to mdevd
hongkongkiwi commented 1 year ago

Yes that makes perfect sense. No need for an extra plugin in this case unless the hotplug plugin does something else?

The reason I'm looking at it is I'm looking at speeding up booting of our system.

Btw, should be without the -C as below

service [S12345789] cgroup.system notify:s6 @root:root mdevd -O 4 -D %n -- MDEVD Extended Hotplug Daemon
task [S] <service/mdevd/ready> @root:root mdevd-coldplug -- Replaying hotplug events to mdevd
troglobit commented 1 year ago

Nice, I'll test it out on my (admittedly) tiny setup.

No, the hotplug plugin checks for the various udev/mdev/mdevd in some "logical" order to do the coldplugging. So it shouldn't be needed.

troglobit commented 1 year ago

Alright, this seems to work perfectly! Maybe you can give it a go locally first (disabling mdevd and hotplug plugins), before I push any changes to the repo?

hongkongkiwi commented 1 year ago

I will test this shortly, just recompiling buildroot now which takes a little while.

troglobit commented 1 year ago

How did it turn out?

hongkongkiwi commented 1 year ago

Hmm, I used the below lines basically the same as what you wrote, except I like to put the apps into categories so I can simply do initctl status mdevd later and get all associated tasks or services.

# Start mdevd
service [S12345789] name:mdevd :daemon notify:s6 mdevd -O 4 -D %n -- MDEVD Extended Hotplug Daemon
task [S] <service/mdevd:daemon/ready> name:mdevd :coldplug mdevd-coldplug -- Replaying hotplug events to mdevd

Is it intentional that the task disappears from the status list completely when it's only set as [S]? initctl status does not show the task at all, it's like it doesn't exist.

initctl status mdevd:coldplug
initctl: no such task or service(s): mdevd:coldplug

However, in initctl cond dump shows that the task was completed: 1 init on <task/mdevd:coldplug/success>

This might be intended behaviour, I just hadn't seen it before (but then I don't use any [S] only tasks.

troglobit commented 1 year ago

Sounds like it worked, yay! :partying_face:

Yes, it's intentional. Runlevel S is a bit special since you can never return to it with the initctl runlevel N command. Before moving to runlevel 2 (or whatever is your preferred std runlevel), Finit waits for all runlevel S-only tasks to complete and then removes them.

hongkongkiwi commented 1 year ago

Makes sense, looks good.

Could I suggest a feature to add a flag to be able to view status of [S] runlevel tasks? e.g. initctl status --all or something. initctl status is a nice tool.

e.g. would be great to do: initctl status --all mdevd:coldplug or something so I can actually check the status.

My only current way is initctl cond dump | grep 'mdevd:coldplug' which is a little clunky.

troglobit commented 1 year ago

Well, it's not really possible to do what you want, the [S] tasks are actually removed from RAM, so the only thing that remains are any static conditions they've left behind. This is by design, since S provides a way of isolating startup from runtime, we don't want to risk anyone calling initctl start foo when foo is only supposed to ever run at bootstrap -- a great example is the mdevd-coldplug.

hongkongkiwi commented 1 year ago

I'm having a very strange problem, the last [S] task to run is repeated. It doesn't matter what the task is. I don't know if it's repeated or the task actually gets restarted.

I've tried to debug with a finit debug in kernel run vars, but it just creates too much debugging that it's hard to really spot the issue. Even after 10 minutes it's still showing debug without loading the system yet. so I am unsure how to debug better.

[ OK ] Checking filesystem /dev/mmcblk0p10
[ OK ] Checking filesystem /dev/mmcblk0p11
[ OK ] Checking filesystem /dev/mmcblk0p12
[ OK ] Checking filesystem /dev/mmcblk0p13
[ OK ] Checking filesystem /dev/mmcblk0p15
[ OK ] Mounting filesystems from /etc/fstab.slot.flash_recovery_rootfs.0
[ OK ] Creating machine UUID for D-Bus
[ OK ] Initializing random number generator
[ OK ] Starting D-Bus message bus daemon
[ OK ] Starting Chrony Time Daemon
[ OK ] Starting Proxy for QMI devices
[ OK ] Starting LTE daemon
[ OK ] EMMC Clear Temp Dir
[ OK ] Starting MDEVD Extended Hotplug Daemon
[ OK ] Starting System log daemon
[ OK ] Starting Kernel log daemon
[ OK ] EMMC Clear Temp Dir
[ OK ] EMMC Clear Temp Dir
[ OK ] EMMC Clear Temp Dir
[ OK ] EMMC Clear Temp Dir
[ OK ] EMMC Clear Temp Dir
[ OK ] EMMC Clear Temp Dir
[ OK ] EMMC Clear Temp Dir

This script is very simple:

task [S] name:emmc :cleartmp emmc-clear-tmp-dir -- EMMC Clear Temp Dir

With the script itself:

#!/bin/sh -u
STORAGE_DIR="${STORAGE_DIR:-"/storage"}"
if [ -n "$STORAGE_DIR" -a -d "${STORAGE_DIR}/tmp" ]; then
  rm -Rf "${STORAGE_DIR}/tmp"
fi
exit 0

The script runs successfully:

~ # initctl cond dump | grep emmc
1     init                  on      <task/emmc:cleartmp/success>

I've noticed the same behaviour sometimes with other things that run last, e.g. if I comment this out, it's whatever is the last [S] task to run.

It's some kind of race condition, because sometimes it doesn't happen and works just fine.

It could be because my cpu is busy and low powered and things take longer than finit expects ? I'm not sure. But it's what is tripping me up sometimes.

When Changing the finit config before, I had this problem with the mdevd coldplug and it really tripped me up a bit.

troglobit commented 1 year ago

Hmm, that's not right. I had, what I believe, similar issues with runlevel S before f3fcca6 (Nov 23), but after that revert it's been stable as a rock.

troglobit commented 1 year ago

I can poke around a bit more tonight to see if I can reproduce, but testing in the office with another buildroot-based system reveals no obvious problems.

troglobit commented 1 year ago

Sorry, looked at this both last night and this morning (CET), but I can't reproduce your problem. Which GIT hash are you using right now?

hongkongkiwi commented 1 year ago

I'm using hash: 685e0a80a435bad3e4d5112093ebad20d7eb6ff1

Hmmmm I have an idea actually, just checking at the moment.

hongkongkiwi commented 1 year ago

Nope, my idea did not pan out. I thought since I have hook scripts and in the hook scripts I was calling initctl reload that this was causing some funkyness but alas, that's not the issue. I modified my scripts so if the hook env var is set it won't try to reload (presume that initctl reloads itself anyway).

I think it might be related to 100% cpu usage, or atleast things lagging:

[ OK ] Checking filesystem /dev/mmcblk0p10
[ OK ] Checking filesystem /dev/mmcblk0p11
[ OK ] Checking filesystem /dev/mmcblk0p12
[ OK ] Checking filesystem /dev/mmcblk0p13
[ OK ] Checking filesystem /dev/mmcblk0p15
[ OK ] Mounting filesystems from /etc/fstab.slot.flash_recovery_rootfs.0
[ OK ] Creating machine UUID for D-Bus
[ OK ] Initializing random number generator
[ OK ] Starting D-Bus message bus daemon
[ OK ] Starting Chrony Time Daemon
[ OK ] Starting Proxy for QMI devices
[ OK ] Starting LTE daemon
[ OK ] Starting Nordic daemon
[ OK ] EMMC Clear Temp Dir
[ OK ] Starting MDEVD Extended Hotplug Daemon
[ OK ] Starting System log daemon
[ OK ] Starting Kernel log daemon
[ OK ] EMMC Clear Temp Dir
[ OK ] EMMC Clear Temp Dir
[ OK ] EMMC Clear Temp Dir
[ OK ] EMMC Clear Temp Dir
[ OK ] Starting Suspend/Resume Daemon
[ OK ] Nordic Notify Suspend Ready
[ OK ] EMMC Clear Temp Dir
[ OK ] Nordic Notify Suspend Ready
[ OK ] EMMC Clear Temp Dir
[ OK ] Bringing up network interfaces ...

I guess I need to do a finit.debug session.

troglobit commented 1 year ago

Good theory, and great idea to try and work around it :+1:

Yeah, we need to see:

[^1]: there's a timeout for runlevel S to complete

troglobit commented 1 year ago

mdevd plugin dropped in f6bfbc8, finit-plugins repo updated as well.