just-containers / s6-overlay

s6 overlay for containers (includes execline, s6-linux-utils & a custom init)
Other
3.78k stars 212 forks source link

Ignoring S6_SERVICES_GRACETIME makes graceful shutdown impossible, even in v3 #457

Closed lmagyar closed 2 years ago

lmagyar commented 2 years ago

S6-overlay: v3.1.1.0, 3.1.0.1, Alpine v3.15

related: #337 #179 #356

see: #458 because in that issue S6_SERVICES_GRACETIME works! maybe it helps to locate the bug in this issue!

repo to reproduce: https://github.com/lmagyar/homeassistant-addon-mariadb-inmemory-beta/tree/BUG1/mariadb

config: 2 etc/services.d services

BUG:

skarnet commented 2 years ago

Your mariadb run script is way outside the specs for a supervised service. It's a big shell script that performs some preparation, runs the mariadb daemon as a child, then performs some more initialization, and waits. It's normal that things aren't going to work well: there are a lot of timing issues with your current script.

If you want mariadb to be supervised, you should cut that script into several parts:

  1. The preparation part.
  2. The server itself.
  3. The post-start initialization part.

Then define three s6-rc services: 1. and 3. are oneshots, 2. is a longrun; 3. depends on 2. which depends on 1. The only longrun is 2. and will be the mariadb server itself, running in the foreground. That's what will be supervised, and when you send it signals, it will properly receive them.

lmagyar commented 2 years ago

Thank you for your quick response and suggestions, I will modify, test and report on the results within few days.

lmagyar commented 2 years ago

Finally I had time to follow your instructions and test this issue.

TLDR:

Finally I've eliminated using trap, and everything works as expected.

But ignoring the S6_SERVICES_GRACETIME timeout if trap is used, is maybe a real problem. If not, feel free to close this issue.


The long script is cut into 3 services.

See: https://github.com/lmagyar/homeassistant-addon-mariadb-inmemory-beta/tree/BUG1.1/mariadb

In this version in the longrunning run script (https://github.com/lmagyar/homeassistant-addon-mariadb-inmemory-beta/blob/BUG1.1/mariadb/rootfs/etc/s6-overlay/s6-rc.d/mariadb-core/run):

# Register stop
function stop_mariadb() {
echo $(date -Iseconds) BEFORE export >> /share/log
    export-homeassistant-database
echo $(date -Iseconds) AFTER export >> /share/log
    bashio::services.delete "mysql"
    mysqladmin shutdown
    # Successful exit, avoid wait exit status to propagate
    exit 0
}
trap "stop_mariadb" SIGTERM SIGHUP

The first echo $(date -Iseconds) BEFORE export >> /share/log was executed, but the service was killed immediately. _So S6_SERVICESGRACETIME is ignored.


Then I moved the long running finishing logic into the post-initialization service's down script.

See: https://github.com/lmagyar/homeassistant-addon-mariadb-inmemory-beta/tree/BUG1.2/mariadb

Down script: (https://github.com/lmagyar/homeassistant-addon-mariadb-inmemory-beta/blob/BUG1.2/mariadb/rootfs/etc/s6-overlay/scripts/mariadb-core-post-down).

The down script was executed perfectly.

But in this version the longrunning service has never received/executed the TERM signal and after 1 minute, instead of the configured 3 minutes the container stopped, ie. _The S6_SERVICESGRACETIME is ignored.

The longrunning run script (https://github.com/lmagyar/homeassistant-addon-mariadb-inmemory-beta/blob/BUG1.2/mariadb/rootfs/etc/s6-overlay/s6-rc.d/mariadb-core/run)

Logs: Successful start:

Jul 07 17:51:57 s6-rc: info: service s6rc-oneshot-runner: starting
Jul 07 17:51:57 s6-rc: info: service s6rc-oneshot-runner successfully started
Jul 07 17:51:57 s6-rc: info: service fix-attrs: starting
Jul 07 17:51:57 s6-rc: info: service cont: starting
Jul 07 17:51:57 s6-rc: info: service fix-attrs successfully started
Jul 07 17:51:57 s6-rc: info: service legacy-cont-init: starting
Jul 07 17:51:57 s6-rc: info: service legacy-cont-init successfully started
Jul 07 17:51:58 s6-rc: info: service cont successfully started
Jul 07 17:51:58 s6-rc: info: service mariadb-core-pre: starting
Jul 07 17:51:58 s6-rc: info: service mariadb-backup: starting
Jul 07 17:51:58 s6-rc: info: service mariadb-backup successfully started
Jul 07 17:51:59 [19:51:59] INFO: Change tmpfs size to 150m
Jul 07 17:51:59 [19:51:59] INFO: Create a new mariadb initial system
Jul 07 17:52:02 s6-rc: info: service mariadb-core-pre successfully started
Jul 07 17:52:02 s6-rc: info: service mariadb-core: starting
Jul 07 17:52:02 s6-rc: info: service mariadb-core successfully started
Jul 07 17:52:02 s6-rc: info: service mariadb-core-post: starting
Jul 07 17:52:02 [19:52:02] INFO: Starting MariaDB
...
Jul 07 17:52:14 [19:52:14] INFO: Successfully sent service information to Home Assistant.
Jul 07 17:52:14 s6-rc: info: service mariadb-core-post successfully started
Jul 07 17:52:14 s6-rc: info: service legacy-services: starting
Jul 07 17:52:14 s6-rc: info: service legacy-services successfully started

Restarting the container:

Jul 07 17:58:11 s6-rc: info: service legacy-services: stopping
Jul 07 17:58:11 s6-rc: info: service legacy-services successfully stopped
Jul 07 17:58:11 s6-rc: info: service legacy-cont-init: stopping
Jul 07 17:58:11 s6-rc: info: service mariadb-core-post: stopping
Jul 07 17:58:11 s6-rc: info: service mariadb-backup: stopping
Jul 07 17:58:11 s6-rc: info: service mariadb-backup successfully stopped
Jul 07 17:58:11 s6-rc: info: service legacy-cont-init successfully stopped
Jul 07 17:58:11 s6-rc: info: service fix-attrs: stopping
Jul 07 17:58:11 s6-rc: info: service fix-attrs successfully stopped
Jul 07 17:58:12 [19:58:12] INFO: Successfully deleted service information from Home Assistant.
Jul 07 17:58:14 [19:58:14] INFO: MariaDB tables locked
Jul 07 17:58:14 [19:58:14] INFO: Exporting databases content...
Jul 07 17:58:15 [19:58:15] INFO: MariaDB tables unlocked
Jul 07 17:58:15 s6-rc: info: service mariadb-core-post successfully stopped
Jul 07 17:58:15 s6-rc: info: service mariadb-core: stopping

The stopping part of the log ends here, nothing happened for 1 minute. After 1 minute the container has started:

Jul 07 17:59:16 s6-rc: info: service s6rc-oneshot-runner: starting
Jul 07 17:59:16 s6-rc: info: service s6rc-oneshot-runner successfully started
Jul 07 17:59:16 s6-rc: info: service fix-attrs: starting
Jul 07 17:59:16 s6-rc: info: service cont: starting
Jul 07 17:59:16 s6-rc: info: service fix-attrs successfully started
Jul 07 17:59:16 s6-rc: info: service legacy-cont-init: starting
Jul 07 17:59:16 s6-rc: info: service legacy-cont-init successfully started
Jul 07 17:59:16 s6-rc: info: service cont successfully started
Jul 07 17:59:16 s6-rc: info: service mariadb-core-pre: starting
...

Then I eliminated trap, see: See: https://github.com/lmagyar/homeassistant-addon-mariadb-inmemory-beta/tree/BUG1.3/mariadb

With a bit ugly workaround.

Oneshot down script:

touch /etc/MARIADB_SHUTDOWN
mysqladmin --user=root shutdown

Longrunning run script:

rm -f /etc/MARIADB_SHUTDOWN
mysqld_safe --datadir="${MARIADB_DATA}" --user=root < /dev/null

if bashio::fs.file_exists "/etc/MARIADB_SHUTDOWN"; then
    # workaround to avoid restarting the service after intended shutdown
    sleep 10
fi

Now everything works as expected.

Jul 07 20:14:54 s6-rc: info: service legacy-services: stopping
Jul 07 20:14:54 s6-rc: info: service legacy-services successfully stopped
Jul 07 20:14:54 s6-rc: info: service legacy-cont-init: stopping
Jul 07 20:14:54 s6-rc: info: service mariadb-backup: stopping
Jul 07 20:14:54 s6-rc: info: service mariadb-backup successfully stopped
Jul 07 20:14:54 s6-rc: info: service mariadb-core-post: stopping
Jul 07 20:14:54 s6-rc: info: service legacy-cont-init successfully stopped
Jul 07 20:14:54 s6-rc: info: service fix-attrs: stopping
Jul 07 20:14:54 s6-rc: info: service fix-attrs successfully stopped
Jul 07 20:14:55 [22:14:55] INFO: Successfully deleted service information from Home Assistant.
Jul 07 20:14:57 [22:14:57] INFO: MariaDB tables locked
Jul 07 20:14:57 [22:14:57] INFO: Exporting database content...
Jul 07 20:14:58 [22:14:58] INFO: MariaDB tables unlocked
Jul 07 20:14:58 [22:14:58] INFO: Stopping MariaDB...
Jul 07 20:14:58 2022-07-07 22:14:58 0 [Note] /usr/bin/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
Jul 07 20:14:58 2022-07-07 22:14:58 0 [Note] /usr/bin/mariadbd: Shutdown complete
Jul 07 20:14:58 220707 22:14:58 mysqld_safe mysqld from pid file /tmp/databases/f48b086e-mariadb-beta.pid ended
Jul 07 20:14:58 [22:14:58] INFO: MariaDB stopped
Jul 07 20:14:59 s6-rc: info: service mariadb-core-post successfully stopped
Jul 07 20:14:59 s6-rc: info: service mariadb-core: stopping
Jul 07 20:14:59 [22:14:59] INFO: MariaDB stopped, restarting...
Jul 07 20:14:59 s6-rc: info: service mariadb-core successfully stopped
Jul 07 20:14:59 s6-rc: info: service mariadb-core-pre: stopping
Jul 07 20:14:59 s6-rc: info: service mariadb-core-pre successfully stopped
Jul 07 20:14:59 s6-rc: info: service cont: stopping
Jul 07 20:14:59 s6-rc: info: service cont successfully stopped
Jul 07 20:14:59 s6-rc: info: service s6rc-oneshot-runner: stopping
Jul 07 20:15:00 s6-rc: info: service s6rc-oneshot-runner successfully stopped
skarnet commented 2 years ago

mysqld_safe is a half-baked attempt to perform supervision around mysqld, and is actually putting itself in the way of s6. That is why you are experiencing trouble. Ideally, you would just run exec mysqld and let s6 supervise it, and let the mysqld process receive the termination signals without going through the mysqladmin boilerplate.

Of course, the way to invoke mysqld directly is badly documented, so it's difficult to determine the proper way to run it under a supervision system. I'm sorry to say that MySQL is one of the worst services in that respect.

I'm happy that you found a way that works for you despite this inconvenience.

lmagyar commented 2 years ago

Thank you for your help!

You were right: looking at the source of mysqld_safe, it has a trap '' 1 2 3 15 line...

Now I understand why trap and SIGTERM was messed up.

I eliminated mysqld_safe.