jupyterhub / kubespawner

Kubernetes spawner for JupyterHub
https://jupyterhub-kubespawner.readthedocs.io
BSD 3-Clause "New" or "Revised" License
536 stars 301 forks source link

Servers being reported as down after hub restart that are not #786

Closed jabbera closed 10 months ago

jabbera commented 10 months ago

Bug description

Hub is reporting servers as down even though they are not when it's restarted. This causes an issue because the hub doesn't track the state of these so no culling occurs and resources continue to be used indefinitely. As you can see it doesn't happen to every server, but did happen to most.

This appears to a bug in Kubespawner.poll() as it must be returning a value other than None for this to be happening.

How to reproduce

Starting a server with user A Then running: kubectl -n jhub rollout restart deployment hub

yields the following in my log file:

[W 2023-09-16 13:21:30.377 JupyterHub app:2726] Allowing service gitrev to complete OAuth without confirmation on an authorization web page
[D 2023-09-16 13:21:30.381 JupyterHub app:2502] Initializing spawners
[D 2023-09-16 13:21:30.398 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for user1:
[D 2023-09-16 13:21:30.402 JupyterHub app:2613] Loading state for user1 from db
[D 2023-09-16 13:21:30.414 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for user2:
[D 2023-09-16 13:21:30.416 JupyterHub app:2613] Loading state for user2 from db
[D 2023-09-16 13:21:30.422 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for user3:test_rh_b64d9d_cwuzn
[D 2023-09-16 13:21:30.425 JupyterHub app:2613] Loading state for user3:test_rh_b64d9d_cwuzn from db
[D 2023-09-16 13:21:30.430 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for user4:
[D 2023-09-16 13:21:30.434 JupyterHub app:2613] Loading state for user4 from db
[D 2023-09-16 13:21:30.439 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for user3:cluster_513a318e_cwuzn
[D 2023-09-16 13:21:30.443 JupyterHub app:2613] Loading state for user3:cluster_513a318e_cwuzn from db
[D 2023-09-16 13:21:30.455 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for user5:
[D 2023-09-16 13:21:30.457 JupyterHub app:2613] Loading state for user5 from db
[D 2023-09-16 13:21:30.463 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for user6:
[D 2023-09-16 13:21:30.466 JupyterHub app:2613] Loading state for user6 from db
[D 2023-09-16 13:21:30.477 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for user7:
[D 2023-09-16 13:21:30.479 JupyterHub app:2613] Loading state for user7 from db
[D 2023-09-16 13:21:30.485 JupyterHub user:431] Creating <class 'kubespawner.spawner.KubeSpawner'> for user8:
[D 2023-09-16 13:21:30.487 JupyterHub app:2613] Loading state for user8 from db
[D 2023-09-16 13:21:30.487 JupyterHub app:2624] Awaiting checks for 9 possibly-running spawners
[W 2023-09-16 13:21:30.489 JupyterHub app:2581] user2 appears to have stopped while the Hub was down
[W 2023-09-16 13:21:30.489 JupyterHub app:2581] user3:test_rh_b64d9d_cwuzn appears to have stopped while the Hub was down
[W 2023-09-16 13:21:30.490 JupyterHub app:2581] user4 appears to have stopped while the Hub was down
[W 2023-09-16 13:21:30.490 JupyterHub app:2581] user3:cluster_513a318e_cwuzn appears to have stopped while the Hub was down
[W 2023-09-16 13:21:30.490 JupyterHub app:2581] user5 appears to have stopped while the Hub was down
[W 2023-09-16 13:21:30.490 JupyterHub app:2581] user6 appears to have stopped while the Hub was down
[W 2023-09-16 13:21:30.491 JupyterHub app:2581] user7 appears to have stopped while the Hub was down
[W 2023-09-16 13:21:30.491 JupyterHub app:2581] user8 appears to have stopped while the Hub was down
[D 2023-09-16 13:21:30.539 JupyterHub app:2559] Verifying that afazlani is running at http://10.15.28.133:8888/user/user1/
[I 2023-09-16 13:21:30.540 JupyterHub reflector:274] watching for pods with label selector='component=singleuser-server' in namespace jhub
[D 2023-09-16 13:21:30.540 JupyterHub reflector:281] Connecting pods watcher

running kubectl get pods validated all of these pods are actually running. User2 was actually started no more then 4 minutes before the hub was restarted.

Expected behaviour

Servers should be detected as up

Actual behaviour

Servers are detected as down

Your personal set up

helm chart 3.0.2 (hub 4.0.2) on AKS

jabbera commented 10 months ago

This is fixed by: https://github.com/jupyterhub/kubespawner/pull/742. 6.1.0 is desperately needed!