linuxserver / docker-deluge

GNU General Public License v3.0
243 stars 89 forks source link

need a good/documented way to shut deluge down cleanly #105

Closed sjpotter closed 2 years ago

sjpotter commented 3 years ago

linuxserver.io

a documented way to shutdown deluge cleanly, so it can be restarted without a problem. I've tried docker stop. I've tried shutting down the daemon from within the webui (just gives an error) and thinclient (doesn't seem to do anything, perhaps same error). I'd be willing to exec into the container and run a ocmmand to shut it down cleanly, but I can't figure out how to do that.


Expected Behavior

the ability to shutdown deluge container without corrupting torrent state

Current Behavior

stopping the container can (and has for me many times) resulted in corrupt torrent state files, requiring those torrents to be removed and readded manually.

Steps to Reproduce

  1. run a daemon with many torrent (I'm at 1000)
  2. docker stop container
  3. docker start containe
  4. observe that some torrents are in error state

Environment

OS: ubuntu 21.10 CPU architecture: x86_6 How docker service was installed: distro docker

Command used to create docker container (run/create/compose/screenshot)

docker run -d \ --name=deluge-001 \ --net=host \ -e PUID=1000 \ -e PGID=1000 \ -v ~/deluge/config-001:/config \ -v /data:/data \ -p 8081:80 \ ghcr.io/linuxserver/deluge

Docker logs

      _         ()
     | |  ___   _    __
     | | / __| | |  /  \ 
     | | \__ \ | | | () |
     |_| |___/ |_|  \__/

Brought to you by linuxserver.io

To support LSIO projects visit: https://www.linuxserver.io/donate/

GID/UID

User uid: 1000 User gid: 1000

[cont-init.d] 10-adduser: exited 0. [cont-init.d] 30-config: executing... [cont-init.d] 30-config: exited 0. [cont-init.d] 99-custom-scripts: executing... [custom-init] no custom files found exiting... [cont-init.d] 99-custom-scripts: exited 0. [cont-init.d] done. [services.d] starting services [services.d] done. 20:21:09 [WARNING ][deluge.i18n.util :83 ] IOError when loading translations: [Errno 2] No translation file found for domain: 'deluge'

github-actions[bot] commented 3 years ago

Thanks for opening your first issue here! Be sure to follow the issue template!

byeuji commented 3 years ago

Any update on this issue? I ran into this last night on restarting my client, and re-checking 16TB of files is noooooo bueno.

samsawyer commented 2 years ago

Messed around with this a bit, and this seems to work for me: docker exec -ti deluge deluge-console --config=/config halt

assuming the following:

You can just alias that command in before whatever you're doing to stop or down the deluge container.

Though it would be really nice for the container's entry point script to trap the SIGTERM sent by downing the container and issue this halt command to the deluge daemon itself before proceeding on to finish the shutdown.

calebj commented 2 years ago

I've recently started running into this, and it doesn't seem to be an issue with the run script. For one, I don't see where the _term function introduced in #135 is called. I think that @thelamer meant to also add a trap, like this (note the -x, I couldn't get pidof to work inside the container without it due to the python hashbang in deluged):

_term() {
  echo "Caught SIGTERM signal!"
  echo "Tell Deluged to shut down."
  pid=$(pidof -x deluged)
  deluge-console --config=/config halt
  # terminate when the transmission-daemon process dies
  tail --pid=${pid} -f /dev/null
}

trap _term INT TERM

DELUGE_LOGLEVEL=${DELUGE_LOGLEVEL:-info}

s6-setuidgid abc /usr/bin/deluged -c /config \
    -d --loglevel="${DELUGE_LOGLEVEL}" \
    &
wait

I did try this, and although the run script behaves as intended, there's something else going on with the container. In my testing, I found that deluged does shut down cleanly when sent a SIGTERM, so it something outside the script is causing it to be killed before the shutdown can complete. A docker-compose exec deluge killall -TERM deluged does initiate a clean shutdown, but of course s6 starts it back up again.

But the plot thickens: s6-svscanctl **-t** /run/service/, which brings down the service tree, also works correctly... almost. It sometimes hangs waiting for the sleep infinity in the apiping service. However, sending SIGTERM or SIGINT to s6-svscan like docker does lead to an unclean exit, which is counter to the documentation:

  • SIGTERM : Instruct all the s6-supervise processes to stop their service and exit; wait for the whole supervision tree to die, without losing any logs; then exec into .s6-svscan/finish or exit 0. This behavior can also be achieved via the s6-svscanctl -t scandir command.

So I did some more digging, and found these tunables in the s6 overlay used in the base of this image:

  • S6_KILL_FINISH_MAXTIME (default = 5000): How long (in milliseconds) the system should wait, at shutdown time, for a script in /etc/cont-finish.d to finish naturally. After this duration, the script will be sent a SIGKILL. Bear in mind that scripts in /etc/cont.finish.d are run sequentially, and the shutdown sequence will potentially wait for S6_KILL_FINISH_MAXTIME milliseconds for each script.
  • S6_SERVICES_GRACETIME (default = 3000): How long (in milliseconds) s6 should wait, at shutdown time, for services declared in /etc/services.d to die before proceeding with the rest of the shutdown.
  • S6_KILL_GRACETIME (default = 3000): How long (in milliseconds) s6 should wait, at the end of the shutdown procedure when all the processes have received a TERM signal, for them to die before sending a KILL signal to make sure they're dead.

Changing S6_SERVICES_GRACETIME didn't do any good, but S6_KILL_GRACETIME did give the container time to stop... and a bit more. For whatever reason, it hangs with the only process being s6-linux-init-shutdownd until the full kill gracetime is met. But I found one other suspicious thing that I went to investigate: log lines of s6-svwait: fatal: supervisor died. I traced this message to this script, which is used to take down "legacy" services. This is where the actual bug is happening for me. Running that script one line at a time works fine, but it seems that calling s6-svwait immediately after s6-svscanctl -an /run/service causes it to try to reach a supervisor process that it can't (at least, that's my guess). Adding a sleep 1 before s6-svwait allows for a clean shutdown that respects S6_SERVICES_GRACETIME, but this reeks of a race condition somewhere.

I'm a little put off by this "legacy" talk. It seems that the s6-rc approach is better maintained, but also allows for oneshot services like apiping, and dependencies so that oneshot and deluge-web are only launched after the daemon. I've created a branch in my fork for this, and am testing it now.

UPDATE: It seems to work fine with container stop/restart and compose up/down. Hopefully this is the last time I have to put up with re-checking several TB of data.