Closed snickell closed 3 years ago
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
Here's a pip freeze
for relevant versions of installed jupyterhub packages on the hub pod:
jovyan@hub-7567cff59b-lpzpw:/usr/local/lib/python3.8/dist-packages/jupyterhub_idle_culler$ pip freeze | grep jupyterhub
jupyterhub==2.0.0b2
jupyterhub-firstuseauthenticator==0.14.1
jupyterhub-hmacauthenticator==1.0
jupyterhub-idle-culler==1.2
jupyterhub-kubespawner==1.1.1
jupyterhub-ldapauthenticator==1.3.2
jupyterhub-ltiauthenticator==1.2.0
jupyterhub-nativeauthenticator==0.0.7
jupyterhub-tmpauthenticator==0.6
I haven't figured out how to instrument the return of users from the juphub API, but I believe the problem is that the users dict is not containing the expected servers
key here: https://github.com/jupyterhub/jupyterhub-idle-culler/blob/d53cd25922bbfe9907128f55c7a916b468bed2b9/jupyterhub_idle_culler/__init__.py#L309-L310
Its a little tricky to figure out how to debug this in a z2jh setup, where the pods are put up by the helm layers, but I'm still working on figuring out where/how to get JUPYTERHUB_API_TOKEN
when shelled into the hub
pod so I can restart the python3 -m jupyterhub_idle_culler --url=http://localhost:8081/hub/api --timeout=3600 --cull-every=600 --concurrency=10
process and add more logging instrumentation to figure out what user
is.
From inside the hub pod user is like this:
> /usr/local/lib/python3.8/dist-packages/jupyterhub_idle_culler/__init__.py(304)handle_user()
303 # 0.8 only does this when named servers are enabled.
--> 304 if "servers" in user:
305 servers = user["servers"]
ipdb> user
{'name': 'seth', 'kind': 'user', 'last_activity': '2021-10-19T02:12:54.290000Z', 'admin': False}
So its not None or something like that, just no servers
(or server
) key(s)
Perhaps the problem is the z2jh chart is not requesting enough scope at the moment (from https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/2ba648afe8be4b3e08a182e49a736c18ea5c4431/jupyterhub/files/hub/jupyterhub_config.py#L332-L339 ):
jupyterhub_idle_culler_role = {
"name": "jupyterhub-idle-culler",
"scopes": [
"list:users",
"read:users:activity",
"delete:servers",
# "admin:users", # dynamically added if --cull-users is passed
],
However according to https://jupyterhub.readthedocs.io/en/latest/rbac/scopes.html#available-scopes the users
scope is "excluding servers, tokens and authentication state".
By my read, I would suspect that the read:servers
scope is required in addition to delete:servers
in order for the /users
GET request to include server details? @consideRatio ?
Just tried with the latest z2jh chart (1.1.3-n123.h2ba648af) which includes jupyterhub 2.0.0b3 and jupyterhub-idle-culler, and seeing the same issue:
jupyterhub==2.0.0b3
jupyterhub-firstuseauthenticator==0.14.1
jupyterhub-hmacauthenticator==1.0
jupyterhub-idle-culler==1.2.1
jupyterhub-kubespawner==1.1.1
jupyterhub-ldapauthenticator==1.3.2
jupyterhub-ltiauthenticator==1.2.0
jupyterhub-nativeauthenticator==0.0.7
jupyterhub-tmpauthenticator==0.6
Aaaaa @snickell I bet you are correct about that, I replaced servers
with delete:servers
but it should have been read:servers
as well!
After upgrading our z2jh jupyterhub to 2.0.0b2 (hub docker image is
jupyterhub/k8s-hub:1.1.3-n101.hc9e57f03
) idle culling has stopped working.Every ten minutes (default idle cull interval I believe) we get an exception in our
hub
pod logs for each user (e.g. this is for user barnold, but we see one of these for each user's pod every 10 minutes and no idle culling occurs):Exception occurs on
if user["server"]
here, but I believe that not taking theif "servers" in user
is a prior deviation from the expected code flow:Its worth noting that our z2jh cluster is a couple years old, and thus there is an increased possibility that this could stem from a state migration error "if you started on 0.1 you'll end up having this".