jupyterhub / systemdspawner

Spawn JupyterHub single-user notebook servers with systemd
BSD 3-Clause "New" or "Revised" License
92 stars 45 forks source link

Automatic reset of failed units #44

Closed hansen-m closed 5 years ago

hansen-m commented 5 years ago

I ocasionally see messages like this:

systemd[1]: Unit jupyter-<some user id>-singleuser.service entered failed state.

Where the user will be unable to spawn their server session. Running sudo systemctl reset-failed will reset all the failed units and the frustrated user will be up and running again... until next time.

Is there a way to have this happen automatically?

hansen-m commented 5 years ago

Can the equivalent of config options Restart=always and RestartSec=3 be set when the service is created?

hansen-m commented 5 years ago

The option isn't documented but I'm guessing this would work...

c.SystemdSpawner.unit_extra_properties = {'Restart': 'always', 'RestartSec': '3'}

Although perhaps not on CentOS/RHEL 7 systemd version 219 due to https://github.com/systemd/systemd/issues/4402

yorickvP commented 5 years ago

Just ran into this. stopping the server failed with a timeout, systemd killed it afterwards and put the unit into a failed state. Required manual intervention to fix.

hansen-m commented 5 years ago

@yorickvP I've been seeing this fairly regularly. Not sure of the cause but I do have some very "creative" users.

Until I can get a newer version of systemd I just setup a crude cronjob to periodically fix it:

/usr/bin/systemctl status jupyter-* | grep -q 'failed' && /usr/bin/systemctl reset-failed jupyter-*
RohitK89 commented 5 years ago

Yeah, this is still a problem with the current release. Here's one situation that causes this problem:

The error in the log:

Failed to start transient service unit: Unit <service> is not loaded properly: Device or resource busy

To resolve: