lichess-org / fishnet

Distributed Stockfish analysis for lichess.org
https://lichess.org/get-fishnet
GNU General Public License v3.0
738 stars 102 forks source link

Add graceful shutdown to systemd unit #260

Open bharrisau opened 9 months ago

bharrisau commented 9 months ago

It's a bit hacky to get around the lack of asynchronous stop commands. But I've got something like this to give a 5 minute window to the process to finish any remaining work.

[Service]
ExecStart=/home/fishnet/fishnet --auto-update --conf /home/fishnet/fishnet.ini run
ExecStop=/bin/kill -INT "$MAINPID"
ExecStop=/usr/bin/sh -c 'while kill -0 "$MAINPID" 2>/dev/null; do sleep 1; done'
KillMode=mixed
TimeoutStopSec=300
niklasf commented 9 months ago

Generally such long timeouts for shutdown are not acceptable, so instead the default KillSignal=SIGTERM is used. This will cause a somewhat graceful shutdown, i.e., work won't be completed, but the server is notified.

The hack above can be simplified to:

KillMode=mixed
KillSignal=SIGINT
TimeoutStopSec=300
bharrisau commented 9 months ago

You don't get the SIGTERM then though, just straight to SIGKILL if the batch takes too long. That might be worse as the server isn't notified or anything.

Yes, the 5 minute time isn't for everyone. Just shared it incase it was handy.

On Thu, 22 Feb 2024, at 2:42 AM, Niklas Fiekas wrote:

Generally such long timeouts for shutdown are not acceptable, so instead the default KillSignal=SIGTERM is used. This will cause a somewhat graceful shutdown, i.e., work won't be completed, but the server is notified.

The hack above can be simplified to:

KillMode=mixed KillSignal=SIGINT TimeoutStopSec=300

— Reply to this email directly, view it on GitHub https://github.com/lichess-org/fishnet/issues/260#issuecomment-1957672601, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5JFSM6WAWHQBHZPEM7OLYUY5YRAVCNFSM6AAAAABDSBS3NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJXGY3TENRQGE. You are receiving this because you authored the thread.Message ID: @.***>

niklasf commented 9 months ago

Ah, interesting. I didn't realize how nicely your snippet works: So it's immediate SIGINT, SIGTERM after the timeout, and then eventually SIGKILL if something went wrong?

bharrisau commented 9 months ago

Yeah. But there is only 1 timeout value, so it's the same between SIGINT to SIGTERM, and SIGTERM to SIGKILL.

On Fri, 23 Feb 2024, at 6:08 PM, Niklas Fiekas wrote:

Ah, interesting. I didn't realize how nicely your snippet works: So it's immediate SIGINT, SIGTERM after the timeout, and then eventually SIGKILL if something went wrong?

— Reply to this email directly, view it on GitHub https://github.com/lichess-org/fishnet/issues/260#issuecomment-1961048024, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5JFXJYI4WA6LNCSBMKULYVBTAFAVCNFSM6AAAAABDSBS3NKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRRGA2DQMBSGQ. You are receiving this because you authored the thread.Message ID: @.***>