Kubespawner fails to reset, starts server with wrong image

benjimin commented 1 year ago

Bug description

If a user attempts to run multiple servers in succession with different profiles, the subsequent servers inherit some properties (particularly the image) from the most recent previous servers.

This appears to concern properties that are set by some (but not all) profiles using kubespawner_override.

(I suspect the bug is that the same kubespawner instance is getting reused without resetting its properties back to defaults.)

Expected behaviour

When a user starts a server, their latest profile selection should be honoured in its entirety. Their past history should have no impact.

Actual behaviour

Kubernetes pod resources are created that specify the wrong image. (We're also seeing spawn failures start to accumulate, culminating in hub log messages that 5 of 5 servers have failed to start within 60s. At that point there is a reset, and the next spawn succeeds with the proper image.)

How to reproduce

Our list of profiles includes:

    {
      'default': True,
      'display_name': 'Default environment',
      'description': '2 Cores, 16G Memory',
      'kubespawner_override': {}
    },
    {
      'default': False,
      'display_name': 'Unstable environment',
      'description': '2 Cores, 16G Memory | Unstable release candidate | $0.151 USD per hour',
      'kubespawner_override': {
        'image': 'xxx.ecr/xxx/sandbox:latest',
        'image_pull_policy': 'Always',
      }
    }

Our helm release includes default settings such as:

spec:
  chart:
    repository: https://jupyterhub.github.io/helm-chart/
    name: jupyterhub
    version: 1.1.3
  values:
    singleuser:
      defaultUrl: "/lab"
      memory:
        limit: 15G
        guarantee: 14G
      cpu:
        limit: 1.8
        guarantee: 1.6
      image:
        name: xxx.ecr/xxx/sandbox
        tag: 1.0.9

If a user starts a "default" server, then a pod is created that correctly specifies the image tag 1.0.9.
If the same user stops that server and starts an "unstable" server, then a pod is created that correctly specifies the image tag latest.
If the same user again stops that server and again starts a "default" server, then a pod is created that erroneously specifies the image tag latest again (rather than 1.0.9).

Your personal set up

We're using zero-to-jupyterhub on AWS EKS, deployed using helm and flux.

Specifically, our hub image is jupyterhub/k8s-hub:1.1.3 Note this image specified jupyterhub 1.4.2 and kubespawner 1.1.0.

Our config includes a large number of profiles (e.g. several different machine sizes, several different images), with most sets of profiles restricted to particular AWS Cognito user groups.

welcome[bot] commented 1 year ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

consideRatio commented 1 year ago

I think this is a consequence of a bug in jupyterhub, now fixed.

See:

Upgrading to the z2jh helm chart 2.0.0 should resolve this by using a modern enough version (>=2.2.0) of JupyterHub to include the bugfix.

jupyterhub / kubespawner