Closed JackNewman12 closed 1 year ago
I think there is also an issue with just a initctl reload
.
The <ready>
conditions get set to flux
but my daemon never received a sighup. If I manually send a HUP
to my daemon or do a initctl reload mydaemon
it will resend the READY=1
flag and the <ready>
cond becomes set again.
Interesting, I'll have to create a test to reproduce this. We already have three tests, but it would seem neither of them catch this.
Could you please add some information on how you set the service up, just to rule out user-error.
Messed around my MyLinux tonight to hopefully help reproduce this without having to deal with censoring logs. Running master finit.
Minimal code to reproduce:
# cat services.conf
service notify:systemd /usr/bin/main
service <service/main/ready> /usr/bin/second
# cat main.c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void notify(void) {
const char *sock = getenv("NOTIFY_SOCKET");
int do_notify;
if (sock)
do_notify = atoi(sock);
if (do_notify)
write(do_notify, "READY=1\n", 8);
}
void sig_handler(int signum) {
notify();
}
void main() {
signal(SIGHUP, sig_handler);
notify();
while (1) {
sleep(10);
}
}
# cat second.c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void main() {
while (1) {
sleep(10);
}
}
Some logs that show off the behaviour.
# initctl ; initctl cond
PID IDENT STATUS RUNLEVELS DESCRIPTION
85 main running [--234-----] main
96 second running [--234-----] second
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
96 second on <+service/main/ready>
#
#
# initctl reload
#
# initctl ; initctl cond
PID IDENT STATUS RUNLEVELS DESCRIPTION
85 main running [--234-----] main
96 second paused [--234-----] second
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
96 second flux <~service/main/ready>
#
# kill -HUP 85
# initctl ; initctl cond
PID IDENT STATUS RUNLEVELS DESCRIPTION
85 main running [--234-----] main
96 second running [--234-----] second
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
96 second on <+service/main/ready>
Now this is a little tricker to show over text. But if you spam initctl cond
you can see ready gets set, then clears after the 2 or 5 second timeout.
# killall main
# killall main
# killall main
#### Condition goes down correctly.
# initctl ; initctl cond
PID IDENT STATUS RUNLEVELS DESCRIPTION
0 main restart [--234-----] main
0 second waiting [--234-----] second
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
0 second off <-service/main/ready>
##### Condition gets set correctly
# initctl ; initctl cond
PID IDENT STATUS RUNLEVELS DESCRIPTION
132 main running [--234-----] main
133 second running [--234-----] second
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
133 second on <+service/main/ready>
### After the X000ms timeout, the condition gets cleared
# initctl ; initctl cond
PID IDENT STATUS RUNLEVELS DESCRIPTION
132 main running [--234-----] main
0 second waiting [--234-----] second
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
0 second off <-service/main/ready>
### If the ready notification gets retriggered it continues to work as normal
# kill -HUP 132
# initctl ; initctl cond
PID IDENT STATUS RUNLEVELS DESCRIPTION
132 main running [--234-----] main
138 second running [--234-----] second
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
138 second on <+service/main/ready>
Amazing, thank you so much! <3
I have an initial patch, but it breaks four other test cases, so I'll have to spend a bit more time on it. Will probably take a few days to finalize, just fyi.
I've pushed the revised patch to a separate branch, and will test it manually in my own setups before merging it to master. If you like, you can try it out as well. The tests now pass, including the updated notify.sh
, so I'm fairly confident it'll work in a real-world setup as well.
Some testing on my machine. Almost perfect!
# initctl ; initctl cond;
PID IDENT STATUS RUNLEVELS DESCRIPTION
59 main running [--234-----] main
60 second running [--234-----] second
43 tty:S0 running [-12345----] Getty on /dev/ttyS0
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
60 second on <+service/main/ready>
Working: Restarting a service
# initctl restart main
Jan 13 08:14:59 finit[1]: Stopping main[59], sending SIGTERM ...
Jan 13 08:14:59 finit[1]: Stopping second[60], sending SIGTERM ...
# Jan 13 08:14:59 finit[1]: Starting main[67]
Jan 13 08:14:59 finit[1]: Starting second[68]
# initctl ; initctl cond;
PID IDENT STATUS RUNLEVELS DESCRIPTION
67 main running [--234-----] main
68 second running [--234-----] second
43 tty:S0 running [-12345----] Getty on /dev/ttyS0
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
68 second on <+service/main/ready>
Working: reloading a single service
# initctl reload main
Jan 13 08:15:18 finit[1]: Restarting main[67], sending SIGHUP ...
# initctl ; initctl cond;
PID IDENT STATUS RUNLEVELS DESCRIPTION
67 main running [--234-----] main
68 second running [--234-----] second
43 tty:S0 running [-12345----] Getty on /dev/ttyS0
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
68 second on <+service/main/ready>
Not Working: reloading finit I think the error here is clearing the ready condition since I don't think HUP'ing finit should HUP all services.
# initctl reload
# initctl ; initctl cond;
PID IDENT STATUS RUNLEVELS DESCRIPTION
67 main running [--234-----] main
68 second paused [--234-----] second
43 tty:S0 running [-12345----] Getty on /dev/ttyS0
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
68 second flux <~service/main/ready>
# kill -HUP 67
# initctl ; initctl cond;
PID IDENT STATUS RUNLEVELS DESCRIPTION
67 main running [--234-----] main
68 second running [--234-----] second
43 tty:S0 running [-12345----] Getty on /dev/ttyS0
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
68 second on <+service/main/ready>
Working: Crashing the application
# killall main
Jan 13 08:15:56 finit[1]: Stopping second[68], sending SIGTERM ...
# Jan 13 08:15:56 finit[1]: Service main[67] died, restarting in 2000 msec (1/10)
Jan 13 08:15:56 finit[1]: Starting main[80]
Jan 13 08:15:56 finit[1]: Starting second[81]
# killall main
Jan 13 08:15:57 finit[1]: Stopping second[81], sending SIGTERM ...
Jan 13 08:15:59 finit[1]: Service main[80] died, restarting in 2000 msec (2/10)
Jan 13 08:15:59 finit[1]: Starting main[86]
Jan 13 08:15:59 finit[1]: Starting second[87]
Jan 13 08:16:01 finit[1]: Successfully restarted crashing service main.
# killall main
Jan 13 08:16:01 finit[1]: Stopping second[87], sending SIGTERM ...
# Jan 13 08:16:01 finit[1]: Service main[86] died, restarting in 2000 msec (3/10)
Jan 13 08:16:01 finit[1]: Starting main[89]
Jan 13 08:16:01 finit[1]: Starting second[90]
Jan 13 08:16:03 finit[1]: Successfully restarted crashing service main.
# initctl ; initctl cond;
PID IDENT STATUS RUNLEVELS DESCRIPTION
89 main running [--234-----] main
90 second running [--234-----] second
43 tty:S0 running [-12345----] Getty on /dev/ttyS0
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
90 second on <+service/main/ready>
OK, thanks! I'll have a look at it first thing tomorrow.
Just a few things that may help explain a bit:
intictl reload
is the same thing as init q
or SIGHUP
/etc/finit.d/*
modified/touchedNow, readiness notification is new in this release, and from a quick glance it seems I've missed handling the reload case entirely. Because on reload the configuration "generation" is stepped, and conditions in the old generation will be in "flux". After reload all unmodified services should simply have their ready condition stepped automatically.
Have reproduced this now. It is caused by 1) the service not creating a pid file, which my test services do, causing it to never be marked as started even, and 2) ready not being reasserted at all for systemd style services.
Took a while to get this one right, but I believe Finit now covers all corner cases properly.
Couldn't reproduce with my test cases at first, had to do it manually. The root cause of that was initctl cond get service/main/ready
returning on
instead of flux
, tricking my test. So more than one problem fixed thanks to this issue.
Looks good to me! I can't seem to break it 😁 I'll let you know if anything misbehaves next week. Thanks!
Awesome! :sunglasses:
Thank you for taking the time to both report and follow-up on this, it is very appreciated!
Ahh I think I might have found another edgecase.
READY=1
. In this case it is waiting on hardware.initctl reload
-> daemon/ready gets asserted. Replacing the initctl reload
with initctl reload daemon
does not have the same issue, and also seems to stop it happening in the future.
READY=1
. In this case it is waiting on hardware.initctl reload daemon
-> daemon/ready is NOT assertedinitctl reload
-> daemon/ready is NOT asserted. Note that is particular daemon creates a PID file as it starts up, but won't READY=1 until the hardware it talks to is actually ready. I'll try and get a minimal code example for this tonight,
Hmm, OK not really following, but very interested to see.
I tried to reproduce with your previous example, but on initctl reload
it receives a HUP and does send READY=1
, so likely not your edge case.
For me, both of the sequences generate the same behavior in fact.
Off the top of my head if you disable the notify()
function, you should be able to reproduce the issue.
I'll have a go at reproducing in a few hours.
Aha, yeah that would be a neat edge case! However, still cannot reproduce, sorry.
Yeah looks like it is due to also having a pid file present.
cat main.c
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void createPidFile() {
// Create the pid file to stop other copies running and let the system know we are running, but we may not be 'ready' yet.
FILE *pidFile = fopen("/var/run/main.pid", "w");
fprintf( pidFile, "%ld", ( long ) getpid() );
fclose(pidFile);
}
void notify(void) {
return; // Disabled - i.e. waiting for hardware, could take X minutes before hardware is ready and child services are allow to connect in a start doing work.
const char *sock = getenv("NOTIFY_SOCKET");
int do_notify;
if (sock)
do_notify = atoi(sock);
if (do_notify)
write(do_notify, "READY=1\n", 8);
}
void sig_handler(int signum) {
notify();
}
void main() {
createPidFile();
signal(SIGHUP, sig_handler);
notify();
while (1) {
sleep(10);
}
}
# initctl debug
finit[1]: Debug mode enabled
Jan 16 10:08:35 noname daemon.notice finit[1]: Debug mode enabled
#
# initctl
finit[1]: api_cb():get runlevel
Jan 16 10:08:38 noname daemon.debug finit[1]: api_cb():get runlevel
PID IDENT STATUS RUNLEVELS DESCRIPTION
41 main running [--234-----] main
0 second waiting [--234-----] second
43 tty:1 running [-12345----] Getty on tty1
44 tty:1 running [-12345----] Getty on ttyS0
39 syslogd running [S123456789] System log daemon
0 klogd waiting [S123456789] Kernel log daemon
# initctl cond
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
0 second off <-service/main/ready>
0 klogd off <-pid/syslogd>
#
#
#
#
# initctl reload
finit[1]: api_cb():reload
finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 1
finit[1]: sm_step():state: reload/change, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit[1]: conf_reload():Set time daylight: 0 timezone: 0 tzname: UTC
Jan 16 10:08:47 noname daemon.debug finit[1]: api_cb():reload
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 1
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():state: reload/change, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
Jan 16 10:08:47 noname daemon.debug finit[1]: conf_reload():Set time daylight: 0 timezone: 0 tzname: UTC
finit[1]: parse_conf():*** Parsing /etc/finit.conf
finit[1]: parse_conf():*** Parsing /etc/finit.d/main.conf
finit[1]: service_register():Found existing svc for /usr/bin/main name main id type 1
finit[1]: service_register():Service main runlevel 0x1c
finit[1]: service_register():Found existing svc for /usr/bin/second name second id type 1
finit[1]: service_register():Service second runlevel 0x1c
Jan 16 10:08:47 noname daemon.debug finit[1]: parse_conf():*** Parsing /etc/finit.conf
Jan 16 10:08:47 noname daemon.debug finit[1]: parse_conf():*** Parsing /etc/finit.d/main.conf
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for /usr/bin/main name main id type 1
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service main runlevel 0x1c
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for /usr/bin/second name second id type 1
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service second runlevel 0x1c
finit[1]: devmon_add_cond():no match
finit[1]: parse_conf():*** Parsing /etc/finit.d/enabled/getty.conf
finit[1]: tty_parse_args():Registering built-in getty on TTY @console at 0 baud with term N/A
Jan 16 10:08:47 noname daemon.debug finit[1]: devmon_add_cond():no match
Jan 16 10:08:47 noname daemon.debug finit[1]: parse_conf():*** Parsing /etc/finit.d/enabled/getty.conf
Jan 16 10:08:47 noname daemon.debug finit[1]: tty_parse_args():Registering built-in getty on TTY @console at 0 baud with term N/A
finit[1]: tty_atcon():consoles: tty1 ttyS0
Jan 16 10:08:47 noname daemon.debug finit[1]: tty_atcon():consoles: tty1 ttyS0
finit[1]: service_register():Found existing svc for /libexec/finit/getty name @console id type 8
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for /libexec/finit/getty name @console id type 8
finit[1]: service_register():Service tty:1 runlevel 0x3e
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service tty:1 runlevel 0x3e
finit[1]: service_register():Found existing svc for /libexec/finit/getty name @console id 1 type 8
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for /libexec/finit/getty name @console id 1 type 8
finit[1]: service_register():Service tty:1 runlevel 0x3e
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service tty:1 runlevel 0x3e
finit[1]: /libexec/finit/getty (tty:1) asserts the same condition as /libexec/finit/getty (tty:1) => pid/tty:1
Jan 16 10:08:47 noname daemon.warn finit[1]: /libexec/finit/getty (tty:1) asserts the same condition as /libexec/finit/getty (tty:1) => pid/tty:1
finit[1]: tty_parse_args():Incomplete or non-existing TTY device given, cannot register.
finit[1]: parse_conf():*** Parsing /etc/finit.d/enabled/syslogd.conf
finit[1]: service_register():Found existing svc for syslogd name syslogd id type 1
finit[1]: service_register():Service syslogd runlevel 0x3ff
finit[1]: service_register():Found existing svc for klogd name klogd id type 1
Jan 16 10:08:47 noname daemon.err finit[1]: tty_parse_args():Incomplete or non-existing TTY device given, cannot register.
Jan 16 10:08:47 noname daemon.debug finit[1]: parse_conf():*** Parsing /etc/finit.d/enabled/syslogd.conf
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for syslogd name syslogd id type 1
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service syslogd runlevel 0x3ff
finit[1]: service_register():Service klogd runlevel 0x3ff
finit[1]: devmon_add_cond():no match
finit[1]: sm_step():Stopping services not allowed after reconf ...
finit[1]: cond_reload():
finit[1]: service_step(): main( 41): running enabled/clean cond:on
finit[1]: service_step(): second( 0): waiting enabled/clean cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Found existing svc for klogd name klogd id type 1
Jan 16 10:08:47 noname daemon.debug finit[1]: service_register():Service klogd runlevel 0x3ff
Jan 16 10:08:47 noname daemon.debug finit[1]: devmon_add_cond():no match
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Stopping services not allowed after reconf ...
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_reload():
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): main( 41): running enabled/clean cond:on
finit[1]: service_step(): tty:1( 43): running enabled/clean cond:on
finit[1]: service_step(): tty:1( 44): running enabled/clean cond:on
finit[1]: service_step(): syslogd( 39): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): second( 0): waiting enabled/clean cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): tty:1( 43): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): tty:1( 44): running enabled/clean cond:on
finit[1]: service_step(): klogd( 0): waiting enabled/clean cond:off
finit[1]: sm_step():state: reload/wait, runlevel: 2, newlevel: -1, teardown: 1, reload: 0
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): syslogd( 39): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): klogd( 0): waiting enabled/clean cond:off
finit[1]: sm_step():Starting services after reconf ...
finit[1]: service_step(): main( 41): running enabled/clean cond:on
finit[1]: service_step(): second( 0): waiting enabled/clean cond:off
finit[1]: service_step(): tty:1( 43): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():state: reload/wait, runlevel: 2, newlevel: -1, teardown: 1, reload: 0
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Starting services after reconf ...
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): main( 41): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): second( 0): waiting enabled/clean cond:off
finit[1]: service_step(): tty:1( 44): running enabled/clean cond:on
finit[1]: service_step(): syslogd( 39): running enabled/clean cond:on
finit[1]: service_step(): klogd( 0): waiting enabled/clean cond:off
finit[1]: sm_step():Calling reconf hooks ...
finit[1]: plugin_run_hook():Calling netlink hook n:o 8 (arg: 0x55aa58d56630) ...
finit[1]: cond_reassert():net/
finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/running => /net/lo/running
finit[1]: cond_set():/net/lo/running
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): tty:1( 43): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): tty:1( 44): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): syslogd( 39): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): klogd( 0): waiting enabled/clean cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Calling reconf hooks ...
finit[1]: cond_set_noupdate():/net/lo/running
finit[1]: cond_set_path():/run/finit/cond//net/lo/running <= 2
finit[1]: cond_update():/net/lo/running
finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/up => /net/lo/up
Jan 16 10:08:47 noname daemon.debug finit[1]: plugin_run_hook():Calling netlink hook n:o 8 (arg: 0x55aa58d56630) ...
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_reassert():net/
Jan 16 10:08:47 noname daemon.debug finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/running => /net/lo/running
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set():/net/lo/running
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_noupdate():/net/lo/running
finit[1]: cond_set():/net/lo/up
finit[1]: cond_set_noupdate():/net/lo/up
finit[1]: cond_set_path():/run/finit/cond//net/lo/up <= 2
finit[1]: cond_update():/net/lo/up
finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/exist => /net/lo/exist
finit[1]: cond_set():/net/lo/exist
finit[1]: cond_set_noupdate():/net/lo/exist
finit[1]: cond_set_path():/run/finit/cond//net/lo/exist <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond//net/lo/running <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():/net/lo/running
Jan 16 10:08:47 noname daemon.debug finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/up => /net/lo/up
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set():/net/lo/up
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_noupdate():/net/lo/up
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond//net/lo/up <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():/net/lo/up
finit[1]: cond_update():/net/lo/exist
finit[1]: plugin_run_hook():Calling pidfile hook n:o 8 (arg: 0x55aa58d56630) ...
finit[1]: cond_set_path():/run/finit/cond/pid/main <= 2
finit[1]: service_step(): main( 41): running enabled/clean cond:on
finit[1]: service_step(): second( 0): waiting enabled/clean cond:off
finit[1]: service_step(): syslogd( 39): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: do_assert():Reasserting /run/finit/cond/net/lo/exist => /net/lo/exist
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set():/net/lo/exist
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_noupdate():/net/lo/exist
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond//net/lo/exist <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():/net/lo/exist
finit[1]: service_step(): klogd( 0): waiting enabled/clean cond:off
finit[1]: cond_set_oneshot():nop
finit[1]: sm_step():Update configuration generation of unmodified non-native services ...
finit[1]: cond_set():service/main/ready
finit[1]: cond_set_noupdate():service/main/ready
finit[1]: cond_set_path():/run/finit/cond/service/main/ready <= 2
finit[1]: cond_update():service/main/ready
Jan 16 10:08:47 noname daemon.debug finit[1]: plugin_run_hook():Calling pidfile hook n:o 8 (arg: 0x55aa58d56630) ...
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond/pid/main <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): main( 41): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): second( 0): waiting enabled/clean cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): syslogd( 39): running enabled/clean cond:on
finit[1]: cond_update():service/main/ready: match <service/main/ready> second(/usr/bin/second)
finit[1]: service_step(): second( 0): waiting enabled/clean cond:on
finit[1]: cond_clear():service/second/
finit[1]: cond_clear_noupdate():service/second/
finit[1]: cond_set_path():/run/finit/cond/service/second/ <= 0
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): klogd( 0): waiting enabled/clean cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_oneshot():nop
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Update configuration generation of unmodified non-native services ...
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set():service/main/ready
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_noupdate():service/main/ready
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond/service/main/ready <= 2
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():service/main/ready
finit[1]: service_step(): second( 0): -> starting
finit[1]: service_step(): second( 0): starting enabled/clean cond:on
finit[1]: Starting second[55]
finit[1]: cond_clear():service/second/
finit[1]: cond_clear_noupdate():service/second/
finit[1]: cond_set_path():/run/finit/cond/service/second/ <= 0
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():service/main/ready: match <service/main/ready> second(/usr/bin/second)
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): second( 0): waiting enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_clear():service/second/
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_clear_noupdate():service/second/
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond/service/second/ <= 0
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): second( 0): -> starting
finit[1]: cond_set_oneshot():service/second/running
finit[1]: cond_set_oneshot_noupdate():service/second/running => /run/finit/cond/service/second/running
finit[1]: cond_update():service/second/running
finit[1]: service_step(): second( 55): -> running
finit[1]: service_step(): second( 55): running enabled/clean cond:on
finit[1]: service_step(): main( 41): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): second( 0): starting enabled/clean cond:on
Jan 16 10:08:47 noname <117> finit[1]: Starting second[55]
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_clear():service/second/
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_clear_noupdate():service/second/
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_path():/run/finit/cond/service/second/ <= 0
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_oneshot():service/second/running
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_set_oneshot_noupdate():service/second/running => /run/finit/cond/service/second/running
finit[1]: service_step(): second( 55): running enabled/clean cond:on
finit[1]: service_step(): tty:1( 43): running enabled/clean cond:on
finit[1]: service_step(): tty:1( 44): running enabled/clean cond:on
finit[1]: service_step(): syslogd( 39): running enabled/clean cond:on
finit[1]: service_step(): klogd( 0): waiting enabled/clean cond:off
finit[1]: sm_step():Reconfiguration done
finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
Jan 16 10:08:47 noname daemon.debug finit[1]: cond_update():service/second/running
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): second( 55): -> running
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): second( 55): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): main( 41): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): second( 55): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): tty:1( 43): running enabled/clean cond:on
# Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): tty:1( 44): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): syslogd( 39): running enabled/clean cond:on
Jan 16 10:08:47 noname daemon.debug finit[1]: service_step(): klogd( 0): waiting enabled/clean cond:off
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():Reconfiguration done
Jan 16 10:08:47 noname daemon.debug finit[1]: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
#
# initctl
finit[1]: api_cb():get runlevel
Jan 16 10:08:50 noname daemon.debug finit[1]: api_cb():get runlevel
PID IDENT STATUS RUNLEVELS DESCRIPTION
41 main running [--234-----] main
55 second running [--234-----] second
43 tty:1 running [-12345----] Getty on tty1
44 tty:1 running [-12345----] Getty on ttyS0
39 syslogd running [S123456789] System log daemon
0 klogd waiting [S123456789] Kernel log daemon
# initctl cond
PID IDENT STATUS CONDITION (+ ON, ~ FLUX, - OFF)
55 second on <+service/main/ready>
0 klogd off <-pid/syslogd>
#
Aaaargh, of course! I'll have a look at it later tonight ... awesome debugging + deduction there, thanks!
Reproduced.
There, fixed .. and finally I hope! Thank you so much for your patience and great help :heart:
If a service crashes the ready condition does not get set. Causing other daemons that rely on it to be paused.
This also occurs for systemd ready flags (the use case that I care about):
The fix isn't obvious to me on this one. I feel like the service is going to need to keep track of its ready state somewhere, or perhaps if
svc->notify
then don't clear the ready on asvc_set_state(RUNNING)
.Edit: Wild thought. You could add an additional
SVC_STATE
too. `Halted->Waiting->Preparing (only valid for notif services) -> Running"