just-containers / s6-overlay

s6 overlay for containers (includes execline, s6-linux-utils & a custom init)
Other
3.77k stars 212 forks source link

Prevent App Restart Loop #440

Closed SSoft7 closed 2 years ago

SSoft7 commented 2 years ago

Please provide a small Dockerfile that demonstrates your issue.

Dockerfile

Question

Is there any way to prevent the restart loop of an application when there are some errors while starting the application? Right now I am using Linux server docker images (built by myself using their source code). They are using overlay v2.2.0.3 without many tweaks .

For example my tautulli app keeps throwing the following error and restarts forever. This just slows down my whole system. So my question is is there any way to prevent a loop like this?

Traceback (most recent call last):
  File "/app/tautulli/PlexPy.py", line 20, in <module>
    from Tautulli import main
  File "/app/tautulli/Tautulli.py", line 38, in <module>
    import plexpy
  File "/app/tautulli/plexpy/__init__.py", line 35, in <module>
    from apscheduler.schedulers.background import BackgroundScheduler
  File "/app/tautulli/lib/apscheduler/__init__.py", line 1, in <module>
    from pkg_resources import get_distribution, DistributionNotFound
ImportError: No module named pkg_resources
skarnet commented 2 years ago

We do not provide support for s6-overlay v2 anymore, sorry. If your issue persists with v3, please open a new issue.

By the way, maybe I'm misunderstanding things, but the Dockerfile you provided does not look like it's using s6-overlay at all.

SSoft7 commented 2 years ago

We do not provide support for s6-overlay v2 anymore, sorry. If your issue persists with v3, please open a new issue.

By the way, maybe I'm misunderstanding things, but the Dockerfile you provided does not look like it's using s6-overlay at all.

The dockerfile inherits from ghcr.io/linuxserver/baseimage-alpine:3.15 which is here - https://github.com/linuxserver/docker-baseimage-alpine/blob/master/Dockerfile

Not looking for any bugfix/new feature in the v2, but I am just asking if there was any built-in ENV variable/configuration that had the ability to prevent restart loop of a service?

skarnet commented 2 years ago

In v2 as in v3, the point of using supervision is to make sure your services remain up. There are ways to tell the supervisor not to restart a given service, but they aren't accessible via container environment variables - you'd have to use an s6-svc -O command at some point during the container startup sequence.

If you want to exit the container when your application fails, the intended usage is to make your application the CMD, not a supervised service; s6-overlay is designed to trigger its container shutdown sequence and exit when its CMD exits.

Apart from that, services are expected to succeed; if you have a service that could conditionally start depending on external factors, I suggest encoding the condition in the service's run script - then you'll be able to test external variables. For instance, with the following run script, the foobard daemon will only be started if the DO_NOT_START_FOOBARD environment variable is unset or set to 0, otherwise the service will unsupervise and stop itself:

#!/command/with-contenv sh

if test 0$DO_NOT_START_FOOBARD -eq 0 ; then
  exec foobard
else
  s6-svc -Od .
  exit 0
fi
SSoft7 commented 2 years ago

Apart from that, services are expected to succeed; if you have a service that could conditionally start depending on external factors

Well in my case the service was failing because the python module pkg_resources was missing. I've also seen such restart loops when there is some syntax error in the configuration file of an app or the configuration file is just broken.

I was actually hoping an option similar to StartLimitIntervalSec=interval , StartLimitBurst=burst like in systemd.

https://www.freedesktop.org/software/systemd/man/systemd.unit.html#StartLimitIntervalSec=interval

Even If this is not available in v2, this could be a nice addition for the v3.

skarnet commented 2 years ago

Well in my case the service was failing because the python module pkg_resources was missing.

Yes. That's something you detect when testing, that's not something that randomly comes up in production, where your container image is stable and debugged.

I've also seen such restart loops when there is some syntax error in the configuration file of an app or the configuration file is just broken.

Those are also detected when testing the setup. These errors do not show up in production; they don't need run-time mitigation.

skarnet commented 2 years ago

Closing this; please reopen if anything is unclear.

ehudkaldor commented 2 years ago

a follow up as i have a similar situation. can the check and condition be done in a oneshot and effect a following longrun? I have a container running vault as a supporting service for the actual process running in the container. i want to check it the vault token was provided and run vault if so, but prevent vault from running if the token was not provided. and as i check parameters in a oneshot init, i wanted to make the decision if the vault longrun itself should be run. the snippet above will (hopefully) work if i put it in the longrun, but it makes more sense for me to put it in the init, if possible.

ehudkaldor commented 2 years ago

BTW, i found that just testing for the env var without the envelope of 0 works too. is there a reason not to use it?

/ # unset D
/ # execlineb -c "if { test $D } echo yes "
/ # export D=ff
/ # execlineb -c "if { test $D } echo yes "
yes
/ # export D=
/ # execlineb -c "if { test $D } echo yes "
/ # 
/ # 
skarnet commented 2 years ago

With the current s6-rc version, the service database is static: once your service set is defined, you cannot decide at runtime whether a service is going to run based on execution results of previous services. All you can do is make a subset of services fail to start.

You can put your token check in a oneshot that fails if the token isn't present and have the vault longrun depend on that oneshot: the vault longrun won't start if the token check fails. However, this is understood by s6-rc as a partial failure: if that's what you want, good, but if not, it means your container won't start without the token if you set S6_BEHAVIOUR_IF_STAGE2_FAILS to 2 or more.

About variable envelopes: if you're just testing for presence/absence, you don't need the 0 envelope indeed. s6-overlay has it because it tests for numeric variables, with possible values of 0, 1, 2, etc.

nejtr0n commented 1 day ago

@skarnet I've set S6_BEHAVIOUR_IF_STAGE2_FAILS=2 but no luck.

I have

When I'm killing cmd service, everything ok. (Container stops). But when im killing service from s6-rc.d, it starts again (Do not stop container).

But i need to stop container. What I'm doing wrong?

skarnet commented 1 day ago

S6_BEHAVIOUR_IF_STAGE2_FAILS only applies when the service fails to start. If your service starts properly, is functional and then you're killing it, then it won't apply and your container won't stop.

If you want to stop the container from the inside, run /run/s6/basedir/bin/halt. If you want to stop the container when a given application stops, run that application as CMD.

nejtr0n commented 16 hours ago

I've made simple startup script and it working as expected.

#!/bin/sh

if [ ! -f "myapp.pid" ]; then
  touch myapp.pid
  /myapp
else
  echo "Myapp crashed, halting..."
  /run/s6/basedir/bin/halt
fi

@skarnet Thank you very much for help!