Closed hh-cn closed 8 months ago
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
I think this is https://discourse.jupyter.org/t/how-to-cleanup-orphaned-user-pods-after-bug-in-z2jh-3-0-and-kubespawner-6-0/21677, its written about in the changelog for the minor relesses 3.1 3.2 and 3.3
Amazing issue writeup @hh-cn!
Bug description
We have observed a strange phenomenon in the cluster. After the hub restarts, the number of running servers viewed in the admin page status does not match the number of user pods obtained by kubectl get pods. In fact, there are many user pods present, but their servers seems not running on the admin side.
How to reproduce
kubectl rollout restart the hub deployment
Speculation about the problem
I reviewed the code, and I believe the issue lies here.
when hub start, it will trying to resume pods/users stats from db
the check function
check_spawner
actually call the spawner's poll methodkubespawner's poll will use a shared Pod ResourceReflector to see if the user pod exists, but the reflector is setted to a init object before it has been finished start up
XXX USER appears to have stopped while the Hub was down
and delete its server from db.But the user pod exists in fact, and when pod_reflector start() finished , it should be right to return a Correct State.
Your personal set up
Full environment
Configuration
```python # jupyterhub_config.py ```Logs
``` # paste relevant logs here, if any ```