Closed JackNewman12 closed 1 year ago
Interesting! I've been troubleshooting a very similar issue.
What version of Finit are you using, or are you on the bleeding edge as myself?
The latest release v4.3.
I must note that I had to create the /var/run/finit/cond/usr folder myself to get usr conditions working. I wonder if something similar is happening with service folder. I'll have a peek today.
Any thoughts as to where I should be looking or extra debug to enable?
OK. There are changes related to this coming up in v4.4 that I hope to release soon.
The usr conditions folder is created by the initctl
tool, those conditions aren't really supposed to be managed by anything else.
Before we rush ahead with going into the code. Maybe you can tell me a little bit more about MYSERVICE and what you expect Finit to do, and also what are you doing from the outside? Is it a forking daemon, is it expected to crash (I see the signaled: 1
flag going up), or are you in some of the cases above killing the daemon to test it or Finit's restart mechanism?
The particular code in question starts here: https://github.com/troglobit/finit/blob/f483a8aeaa406ad8980ebe27ef46cb5135e89cb3/src/service.c#L1970-L1981
The usr conditions folder is created by the initctl tool, those conditions aren't really supposed to be managed by anything else.
Yes, although I would expect it to create that folder when required instead of failing. Adding the -create flag to the tool does not fix this. Either way adding a mkdir to a startup script is an easy fix.
Maybe you can tell me a little bit more about MYSERVICE
This bug actually applies to all of our daemons. I've been attempting to switch our (very old) init system over the finit to give us some more powerful features. In this case you can imagine a simple daemon that just sits there and does some Tx/Rx data from Ethernet/Serial/Etc. No forking. It was only when I noticed an application was crashing (i.e. the network was down and we didn't handle it) all 10 retries would happen instantly.
and what you expect Finit to do.
For 99% of cases if the daemon is not running, restart it (with a small delay). Nothing too fancy going on, although with finit I expect we will add some extra conditions like network up/down, service to service dependency, etc.
is it expected to crash (I see the signaled: 1 flag going up), or are you in some of the cases above killing the daemon to test it or Finit's restart mechanism?
In the example above I was just killing the daemon manually using kill MYSERVICE
to test Finit's restart mechanism.
I didn't have much time today but it seems to do what I would expect. A service_timeout(service_retry) gets set for X seconds in future but the bottom of the service_step() loop causes it to keep jumping into the next state running-> restart -> halted -> ready. I would have expected once this callback gets set that it returns early so that the callback ends up doing the work. https://github.com/troglobit/finit/blob/f483a8aeaa406ad8980ebe27ef46cb5135e89cb3/src/service.c#L2063-L2067
Thanks for the quick follow-up, was not expecting that! Very much appreciate some background and where-you're-at status :smiley:
Be aware that Finit use a lot of fairly modern kernel features sometimes people run into issues because of having an older kernel or missing some kernel config options.
OK, I'll look into the usr condition thing as a separate issue if I can reproduce it.
The initial delay, like you saw, very short. But then it should work as you say ... the service_retry()
callback should handle the delay. Unless my mind's slipping and I've forgotten something obvious. I'll have a look at it during the day to see what I can find out.
Be aware that Finit use a lot of fairly modern kernel features sometimes people run into issues because of having an older kernel or missing some kernel config options.
We are using 5.4 IIRC. Is there a list of these kernel features I need to enable? We run a pretty minimal build. I don't see any errors from finit at boot, except missing support for cgroupv2.
Thanks for the support!
That remains to be documented unfortunately, but all the new eventfd/signalfd/etc. and DEVTMPFS ... the best I can do meanwhile is to give you this: https://github.com/troglobit/myLinux/blob/main/board/amd64/linux_defconfig
Finit can take care of bootstrapping a pretty bare rootfs using, e.g., the bootmisc.so plugin (many plugins started out as optional but are now mandatory, this is one such). There are some more pointers in myLinux if you're curious.
and https://github.com/troglobit/myLinux/blob/main/board/common/busybox_defconfig, if you're on an embedded system with BusyBox. Notice BusyBox must not be built with CONFIG_MODPROBE_SMALL
.
I believe I have reproduced this issue now. There seems to be at least two cooperating bugs at play here. I'll try to get to the bottom of this over the weekend.
There, finally found the root cause(s) of this! Thank you for taking the time to report it.
Thanks for fixing the restart issue! I managed to test it today and it is working.
If I kill an application it will restart instantly, however if I kill it again within a few seconds it will get the expected delay. Waiting for 5 seconds and killing the app will have it restart instantly again.
Thank you for taking the time to test again! :)
Yeah, this is by design. We don't want to penalize a single crash, which could be attributed to startup issues. The intention is to increment the delay if the service crashes continuously -- so we track the number of crashes (per some period of time).
When a service is crashing I find that finit seems to restart the app instantly. This means all 10 retries can end up being used in a few milliseconds and ends up disabling the service.
Interestingly I can still see the event timer is running as X seconds later finit will print the "Successfully restarted" line.
The conf file is very simple:
Sorry I didn't have timestamps on the log, hopefully my comments are clear enough. Tomorrow I should get a chance to recapture this with timestamps.
Thanks!