jupyterhub / kubespawner

Kubernetes spawner for JupyterHub
https://jupyterhub-kubespawner.readthedocs.io
BSD 3-Clause "New" or "Revised" License
543 stars 304 forks source link

return poll status after first load finish #742

Closed ivyxjc closed 1 year ago

ivyxjc commented 1 year ago

Now, spawner does not wait for fist load finish. So it cannot detect the running pod and return incorrect status to hub.

Update by Erik

This is a bugfix for a regression introduced with KubeSpawner version 5.0.0 and Z2JH since 3.0.0 (or the pre-release 3.0.0-alpha.1 or the development release 3.0.0-0.dev.git.6133.hbfc583f8). It is resolved in KubeSpawner 6.1.0 and z2jh 3.1.0.

For more information and help cleaning up orphaned user pods, see https://discourse.jupyter.org/t/how-to-cleanup-orphaned-user-pods-after-bug-in-z2jh-3-0-and-kubespawner-6-0/21677

welcome[bot] commented 1 year ago

Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly. welcome You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

ivyxjc commented 1 year ago

I ran into the issue with enable_user_namespaces set to default value False.

I am encountering the following problem: When I restart the hub pod, some opening notebooks ran into Service Unavailable .

And I found out that when hub starts, hub will delete some users' server form database due to kubespawner.poll() does not return correct status. The root cause is that kubespwaner.poll() return status even the first load doest not finish.

https://github.com/jupyterhub/jupyterhub/blob/be07c7ef31ea586fcb49143e8dc6b942aadbb8ec/jupyterhub/app.py#L2527

danilopeixoto commented 1 year ago

We've implemented a copy of KubeSpawner with minor changes. We also noticed the Hub was deleting the spawner object of running servers because it couldn't find the server resources in the reflector at startup (init_spawners). The solution presented in the pull request solved our problem. Now the poll method waits for the reflector to flood for the first time.

We did not test the implementation in the original KubeSpawner.

minrk commented 1 year ago

Thanks for the comment @danilopeixoto! I think that this bug may be the cause of https://github.com/jupyterhub/mybinder.org-deploy/issues/2686 leaving orphan pods taking up space on mybinder.org.

I moved the await of first_load to inside _start_reflector, so it's always awaited and hopefully less likely to get missed.

welcome[bot] commented 1 year ago

Congrats on your first merged pull request in this project! :tada: congrats Thank you for contributing, we are very proud of you! :heart: