Closed darabos closed 7 months ago
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
This was fixed here: https://github.com/jupyterhub/kubespawner/pull/742
Bug description
Our Zero-to-JupyterHub 2.0.0 instance running on GKE (1.24.5-gke.600) has failed to start an instance with this error message:
The JupyterHub admin interface does not show the instance as running. But looking in GKE the pod is actually running, days after the incident. So the pod was leaked. (We actually have 4 cases of this.)
In the JupyterHub logs I see the following for this user:
(I've replaced the user name with XXXXXX. Sorry for the long block.)
The user's instance first started normally at 9:15. It was shut down for inactivity at 10:16. At 15:28 the user came back and JupyterHub tried to start up the instance again. Startup failed and JupyterHub tried to delete the instance at 15:33. It keeps trying for 10 minutes, but halfway decides that it "appears to have stopped while the Hub was down".
The Hub was down? Yes, looks like it went down at 15:33:
My guess is that restart glitched the cleanup process and the instance was left running. The next time the user tried it they got the "Volume is already exclusively attached" because the volume was attached to the leaked instance.
Expected behaviour
The startup failure may be my fault or GKE had a slow day. But JupyterHub shouldn't leave pods running.
How to reproduce
Seems very difficult. I have saved the JupyterHub logs from the incident. It includes some personal email addresses but I can share it privately.
Your personal set up
It's a normal Zero-to-JupyterHub on GKE.
Thanks for Zero-to-JupyterHub! It's the best!