Closed hongkongkiwi closed 2 years ago
Finit relies on services creating, and touching, their PID files on startup and reload (SIGHUP). It monitors for PID files in /var/run, which on most systems today is a symlink to /run. Some services create sub-directories here, which is supported, but Finit only looks for *.pid
and sub-dir/pid
in these directories. If the services uses another pattern for their PID file, that is not caught by Finit and the PID condition is not asserted.
I can't really see why it starts working for you after a reload, not without setting up a similar case as you to try and reproduce.
I know it's no comfort to you, but we run all our applications as root and have none of these issues.
Ah, just hit me. Finit relies heavily on inotify to detect changes in .conf, add/remove of .pid files, etc. Is that missing from your kernel perhaps?
I do have INOTIFY_USER selected in my kernel, just checked, is there any other kernel setting I need to check? How would I debug this?
Just pulled latest version (0222feeda19bb9576186423c03a05337ef390f88), seems i'm still having the same issue.
Here's a small sample file:
service [S12345] /usr/bin/earlyoom -- EarlyOOM daemon
service [S12345] pid:syslogd.pid /sbin/syslogd -D -S -n -f /etc/syslog.conf -O /tmp/messages -- System log daemon
service [S12345] <pid/syslogd> /sbin/klogd -n -c 1 -- Kernel log daemon
initctl status
PID IDENT STATUS RUNLEVELS DESCRIPTION
725 tty:S0 running [-12345----] Getty on /dev/ttyS0
724 watchdog:finit running [-123456789] Finit watchdog daemon
721 dbus-daemon running [S12345-789] D-Bus message bus daemon
1150 earlyoom running [S12345----] EarlyOOM daemon
1151 syslogd running [S12345----] System log daemon
0 klogd ready [S12345----] Kernel log daemon
initctl status syslogd
Status : running
Identity : syslogd
Description : System log daemon
Origin : /etc/finit.d/early-system.conf
Environment :
Condition(s):
Command : /sbin/syslogd -D -S -n -f /etc/syslog.conf -O /tmp/messages
PID file : /run/syslogd.pid
PID : 1151
User : root
Group : root
Uptime : 31 sec
Restarts : 0 (0/10)
Runlevels : [S12345----]
initctl cond
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
0 klogd off <-pid/syslogd>
cat /run/syslogd.pid
1151
So the pid is there..... And it's even created by finit itself, but the <pid/syslogd> seems to never pick it up! :/
Ah, this latest report is quite different from the original you posted, right? In the original it was the PID condition for mosquitto that wasn't asserted, here it's the Finit-generated PID condition for syslogd (BusyBox I assume), which could be a problem in Finit, I'll have to set up a test and verify myself. I'll put that on the the TODO for the next release.
Did you ever solve the original problem with mosquitto, was it that it didn't touch it's PID file on reload perhaps?
OK, I've just done more step-by-step testing and I've narrowed down why syslogd & mosquitto behave differently.
Lets take the simple config line below for mosquitto.... this doesn't work. on my system, it has the same issue as syslogd. I can see that a pid is created in /run/mosquitto.pid but finit never picks it up (no matter how many times I initctl reload
). I've also tried just touch /run/mosquitto.pid
but no luck.
# not showing up in initctl cond dump
service [S12345] mosquitto -c /etc/mosquitto.conf -- Mosquitto daemon
HOWEVER, if I add the precondition I was using before <net/lo/exist> things work just fine but only after the second reload.
# showing up in initctl cond dump after second initctl reload
service [S12345] <net/lo/exist> mosquitto -c /etc/mosquitto.conf -- Mosquitto daemon
Here's a pastebin and you can see it in action: https://pastebin.com/tgZ8ckwh
To double check this was the issue, I could replicate the same behaviour with busybox syslogd (although in that case, I made a quick wrapper script to write out the pid just to make sure it wasn't something else).
I should note that in either case, the service runs fine, it's just that finit doesn't register the pid condition in the first case. Strangely though, finit knows it's running when using initctl status mosquitto
.
Forget what I said before about root vs user, that doesn't seem to make any difference so I was mistaken there.
Do you think this could be the issue?
#define IN_MASK_CREATE 0x10000000 / since Linux 4.18 /
I am running kernel 4.9.0 so this IN_MASK_CREATE feature is not in my version.
Perhaps I can patch in this feature if it's the problem? :)
Aha! Yeah very likely that's the issue! Modern Finit (3.0 and later iirc) is really made for "modern" kernels, we've run Linux 4.19 since what feels like forever, so this assumption is definitely related to that.
How do you propose to patch it, patch the kernel?
Awesome! I've solved the issue with a patch. It turns out it was pretty simple, here's my pastebin for it. You could include it with a note for older kernels, I'm very excited! finit is exactly what I've been looking for thank you for your hard work.
@hongkongkiwi great work! I've added it to a contrib/patches subdirectory of the tree, referencing this issue for future devs, thanks!
Glad to hear you like it! It's been a bit of a labor of love for me, so it's amazing to hear other ppl enjoying it as well <3
I'll try to catch up with the rest of the issues and outstanding things during the weekend, the idea is to get another release out soonish.
I'm running finit on my embedded device (with only cgroups v1 support and kernel 4.9.1), I've made a small change to remove init/system/user groups from settings cpu.weight as I don't have that feature.
I'm having two issues:
initctl reload
is run<pid/SERVICE>
conditionHere's some debugging. In the
initctl status
screen, finit knows the apps are running:ps
shows the following:initctl cond dump
shows nothing running initially even though finit shows it as running in the above screeninitctl cond
shows that we are waiting on mosquitto to run some apps (even though mosquitto is running as above)Now when I run
initctl reload && initctl dump
apps running as non-root will show up, in this case vnstatd & mosquitto but chrony/earlyoom/syslogd do not show upfinit.conf service definitions are like this:
Any ideas what might be causing this?