Idle culling broken after upgrading to jupyterhub 2.0.0b3: `KeyError: 'server'`

snickell commented 3 years ago

After upgrading our z2jh jupyterhub to 2.0.0b2 (hub docker image is jupyterhub/k8s-hub:1.1.3-n101.hc9e57f03) idle culling has stopped working.

Every ten minutes (default idle cull interval I believe) we get an exception in our hub pod logs for each user (e.g. this is for user barnold, but we see one of these for each user's pod every 10 minutes and no idle culling occurs):

Exception occurs on if user["server"] here, but I believe that not taking the if "servers" in user is a prior deviation from the expected code flow:

        if "servers" in user:
            servers = user["servers"]
        else:
            # jupyterhub < 0.9 without named servers enabled.
            # create servers dict with one entry for the default server
            # from the user model.
            # only if the server is running.
            servers = {}
            if user["server"]:
                servers[""] = {
                    "last_activity": user["last_activity"],
                    "pending": user["pending"],
                    "url": user["server"],
                }

Its worth noting that our z2jh cluster is a couple years old, and thus there is an increased possibility that this could stem from a state migration error "if you started on 0.1 you'll end up having this".

welcome[bot] commented 3 years ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

snickell commented 3 years ago

Here's a pip freeze for relevant versions of installed jupyterhub packages on the hub pod:

jovyan@hub-7567cff59b-lpzpw:/usr/local/lib/python3.8/dist-packages/jupyterhub_idle_culler$ pip freeze | grep jupyterhub
jupyterhub==2.0.0b2
jupyterhub-firstuseauthenticator==0.14.1
jupyterhub-hmacauthenticator==1.0
jupyterhub-idle-culler==1.2
jupyterhub-kubespawner==1.1.1
jupyterhub-ldapauthenticator==1.3.2
jupyterhub-ltiauthenticator==1.2.0
jupyterhub-nativeauthenticator==0.0.7
jupyterhub-tmpauthenticator==0.6

snickell commented 3 years ago

I haven't figured out how to instrument the return of users from the juphub API, but I believe the problem is that the users dict is not containing the expected servers key here: https://github.com/jupyterhub/jupyterhub-idle-culler/blob/d53cd25922bbfe9907128f55c7a916b468bed2b9/jupyterhub_idle_culler/__init__.py#L309-L310

Its a little tricky to figure out how to debug this in a z2jh setup, where the pods are put up by the helm layers, but I'm still working on figuring out where/how to get JUPYTERHUB_API_TOKEN when shelled into the hub pod so I can restart the python3 -m jupyterhub_idle_culler --url=http://localhost:8081/hub/api --timeout=3600 --cull-every=600 --concurrency=10 process and add more logging instrumentation to figure out what user is.

snickell commented 3 years ago

From inside the hub pod user is like this:

> /usr/local/lib/python3.8/dist-packages/jupyterhub_idle_culler/__init__.py(304)handle_user()
    303         # 0.8 only does this when named servers are enabled.
--> 304         if "servers" in user:
    305             servers = user["servers"]

ipdb> user
{'name': 'seth', 'kind': 'user', 'last_activity': '2021-10-19T02:12:54.290000Z', 'admin': False}

So its not None or something like that, just no servers (or server) key(s)

snickell commented 3 years ago

Perhaps the problem is the z2jh chart is not requesting enough scope at the moment (from https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/2ba648afe8be4b3e08a182e49a736c18ea5c4431/jupyterhub/files/hub/jupyterhub_config.py#L332-L339 ):

    jupyterhub_idle_culler_role = {
        "name": "jupyterhub-idle-culler",
        "scopes": [
            "list:users",
            "read:users:activity",
            "delete:servers",
            # "admin:users", # dynamically added if --cull-users is passed
        ],

However according to https://jupyterhub.readthedocs.io/en/latest/rbac/scopes.html#available-scopes the users scope is "excluding servers, tokens and authentication state".

By my read, I would suspect that the read:servers scope is required in addition to delete:servers in order for the /users GET request to include server details? @consideRatio ?

snickell commented 3 years ago

Just tried with the latest z2jh chart (1.1.3-n123.h2ba648af) which includes jupyterhub 2.0.0b3 and jupyterhub-idle-culler, and seeing the same issue:

jupyterhub==2.0.0b3
jupyterhub-firstuseauthenticator==0.14.1
jupyterhub-hmacauthenticator==1.0
jupyterhub-idle-culler==1.2.1
jupyterhub-kubespawner==1.1.1
jupyterhub-ldapauthenticator==1.3.2
jupyterhub-ltiauthenticator==1.2.0
jupyterhub-nativeauthenticator==0.0.7
jupyterhub-tmpauthenticator==0.6

consideRatio commented 3 years ago

Aaaaa @snickell I bet you are correct about that, I replaced servers with delete:servers but it should have been read:servers as well!

jupyterhub / jupyterhub-idle-culler

Idle culling broken after upgrading to jupyterhub 2.0.0b3: `KeyError: 'server'` #40