Closed hongkongkiwi closed 1 year ago
Just to show this is not a fluke, for some reason dbus had the [WARN] status and the same situation happened:
# finit 6
[FAIL] Saving sound settings
[ OK ] Saving random seed
[FAIL] Stopping D-Bus message bus daemon
[WARN] Killing D-Bus message bus daemon
It will halt at this condition forever.
Interesting, I'll have a look at this in detail and try to set up a testcase for it. We just had a PR for shutdown/kill so there might be a regression.
Just to make sure, which version of Finit are you running; the latest release, or a GIT version? (The PR I mentioned above is not released yet.)
Progress: so far I've only been able to reproduce the [WARN], but for me the system reboots fine. I'm starting to suspect it's not the stopping of services that's at fault, but rather something else. Could you try calling initctl debug
before initctl reboot
?
[ OK ] Stopping Web interface
[WARN] Killing Simple NTP daemon
[ 9.157661] reboot: Restarting system
So, the fix to this issue in 7dc7f9a handles the "stall" in reboot. The actual root cause, which you hinted to, really seems to be #226. See that issue for an update on that as well.
Oh that's great, sorry I didn't get a debug log earlier, we are doing some system porting and I had to switch (temporarily) to another project. I'm really glad to were able to find the cause of this, we are on an embedded platform, so having it not behave as expected when shutting down was quite challenging.
This was a little bit inconsistent for me to replicate, but I'll try the latest version. Thanks for the fix!
Yeah, I'm mostly on embedded systems as well, and reboot must always work. Hope it works better also for you :)
Reopening, I just ran into this one myself trying to reboot and found the following:
...
finit[1]: service_kill():(null): Sending SIGKILL to process group 2577
finit[1]: Stopping pod:system[2577], sending SIGKILL ...
[WARN] Killing System container
...
After which everything just hung forever.
The interesting bit is the (null)
above, it's from an internal function that looks in /proc/2577/status
after the actual process name. Here it could not find one, and the only way for that function to fail is if 2577 no longer exists!
For my use-case pod:system[2577]
is a podman container, which as it turns out, starts conmon
to monitor the container. However, the PID 2577 that it returned in the container pidfile was for that system's init process, not conmon itself. conmon is a process monitor and sub-reaper, hence Finit never got any feedback to proceed and the service_kill()
function exited early leaving Finit to wait forever ...
When rebooting, if a service has a "[WARN]" status the reboot never completes.
I was testing killing an app using
kill <pid>
and having finit to restart the app. finit doesn't seem to pick up the correct pid when this happens see bug #226When this situation happens, I guess that finit gets "out of sync", so when doing:
finit 6
to reboot, it stalls on the above app:By stall, I mean it sits forever on the [WARN] line.
In normal cases finit 6 works totally fine as long as it can kill this app, but any "[WARN]" line seems to halt the rebooting process permanently (no matter how long I wait).