nebari-dev / nebari-slurm

An opinionated open source deployment of jupyterhub based on an Slurm job scheduler.
BSD 3-Clause "New" or "Revised" License
28 stars 10 forks source link

Fix worker node servers getting killed after JuptyerHub restart #124

Closed ericdwang closed 2 years ago

ericdwang commented 2 years ago

Follow-up to #106 and fixes #104 (again)

We discovered in the JupyterHub logs that it was trying to contact the master node for jobs scheduled on worker nodes which was incorrect and led to them getting killed:

Notebook server job 157 started at hpc-worker-02:52649
(JupyterHub restart)
server never showed up at http://hpc-master-node:52649

This fixes the problem by preserving self.server.ip similar to self.server.port in QHubHPCSpawnerBase.poll().

costrouc commented 2 years ago

Thanks @ericdwang!