Open jeflem opened 3 months ago
Seems that stop timeout isn't the problem (Podman sets it to 70 seconds), but the start timeout, which is not set by Podman. Could be set to infinity
, see https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#TimeoutStartSec=
It's not an issue of start or stop timeouts. Both values are set to 60 seconds on dev/Ananke 0.5. The core issue seems to be nvidia-persistenced.service
coming up too slowly. The Ananke container's systemd unit in principle could wait for nvidia-persistenced.service
(via --after
and --requires
arguments to podman generate systemd
). But the nvidia service runs as root and Ananke runs as user. Seems that user services are not allowed to depend on root services (see discussion in systemd issue 3312).
If systemd wants to stop an Ananke container default timeout is 10 seconds, which often too short for shutting down all JLab sessions and JHub gracefully resulting not automatically restarting containers after system reboot. Adding
--stop-timeout=30
to thepodman generate systemd
line inrun.sh
should solve this problem (not tested).