jupyterhub / kubespawner

Kubernetes spawner for JupyterHub
https://jupyterhub-kubespawner.readthedocs.io
BSD 3-Clause "New" or "Revised" License
549 stars 305 forks source link

Create new labels that we can switch over to later #836

Open consideRatio opened 7 months ago

consideRatio commented 7 months ago

Extracted from https://github.com/jupyterhub/kubespawner/pull/835#issuecomment-2075996346:

There's an ongoing discussion about how names (including pod labels) are normalised, and whether this can be changed in a backwards compatible manner:

If we want to do the above this might be a good opportunity to set the new labels to whatever the new normalisation scheme is, and leave the old labels unchanged, avoiding some backward compatibility complexity. It won't avoid all complexity since it's not just labels that are affected.

Extracted from https://github.com/jupyterhub/kubespawner/pull/835#issuecomment-2081207827

I'm pro using this as an opportunity to fix our myriad escaping issues! Some variant of https://github.com/jupyterhub/kubespawner/pull/744 would be pretty amazing.

I don't currently have capacity to push on that though :(


@manics I think I get what you mean now with this might be a good opportunity - you mean that getting these new labels out at the same time as #835, makes us able to switchover with a single breaking change in the future with the same pre-conditions (that users have used kubespawner a while with duplicated sets of labels)?

manics commented 7 months ago

you mean that getting these new labels out at the same time as https://github.com/jupyterhub/kubespawner/pull/835, makes us able to switchover with a single breaking change in the future with the same pre-conditions (that users have used kubespawner a while with duplicated sets of labels)?

Yes, subject to discussion about what we escape and how since it may affect more than just labels. Given this could be a breaking change that affects users I think we should discuss these together, and ensure there's a coherent plan that minimises pain for admins (how do they deal with the breaking changes) and for us as developers (will we maintain multiple code paths temporarily, or indefinitely)?

For many JupyterHub PRs we can merge without resolving all issues since the impact of the PR is limited and can be easily changed, but because the issues discussed here have a big impact on JupyterHub admins I think we should have a coherent plan. It might be that the consensus is that https://github.com/jupyterhub/kubespawner/pull/835 is fine in isolation since the escaping issues will take too much time to fix! But I think it should be a deliberate decision.

consideRatio commented 6 months ago

Given this could be a breaking change that affects users I think we should discuss these together, and ensure there's a coherent plan that minimises pain for admins (how do they deal with the breaking changes) and for us as developers (will we maintain multiple code paths temporarily, or indefinitely)?

I agree on minimizing effort for users and maintainers, and if you think there is a coupling here I'll investigate further.

Tech summary

  1. k8s label values have constraints We have and may still be breaking these constraints in kubespawner for hub.jupyter.org/username and hub.jupyter.org/servername -- the k8s api-server refuses resource creation for resources if constraints aren't met.
  2. the k8s label=value component=singleuser-server This label=value influences what pods KubeSpawner monitors via its k8s resource reflector - removing/changing this label=value makes KubeSpawner loose track of user server pods which is a breaking change forcing all user servers to be shut down before upgrading. This is why its good to add labels alongside the old component as done in #835.
  3. matchLabels for a k8s Deployment ... ... indicates what pods the Deployment monitors and controls. It is an immutable field forcing re-creations of the Deployment resource if its to be changed. Practically when a change to matchLabels is to be done for a Deployment resource, we would delete the Deployment resource first without deleting pods it controls via kubectl delete --cascade=orphan ..., and then re-create the Deployment resource (helm upgrade).

KubeSpawner labels to consider

KubeSpawner functions to consider

My conclusions

My takeaway

I think its good to address #737, #744, #791 now as doing so is breaking for software outside jupyterhub's official software referencing those labels values, and getting a breaking change to land in kubespawner ~now is great as it allows us to flush it into z2jh 4 with other breaking changes demanding users attention.

I see no reason to wait doing #737, #744, #791 until we maybe in the futuer remove component label etc in favor of app.kubernetes.io/component in z2jh 5 or later.

consideRatio commented 6 months ago

It seems #744 is making a change that would influence self.pod_name, self.dns_name, self.secret_name, self.pvc_name, self.namespace, self.working_dir. A change to pod_name requires user servers to be restarted I think, and a change to pvc_name and namespace are probably very breaking changes as they are persistent.

        if self.enable_user_namespaces:
            self.namespace = self._expand_user_properties(self.user_namespace_template)
        self.pod_name = self._expand_user_properties(self.pod_name_template)
        self.dns_name = self.dns_name_template.format(namespace=self.namespace, name=self.pod_name)
        self.secret_name = self._expand_user_properties(self.secret_name_template)
        self.pvc_name = self._expand_user_properties(self.pvc_name_template)
        if self.working_dir:
            self.working_dir = self._expand_user_properties(self.working_dir)

Overall, this makes #744 far more complicated to land than if it just touched the labels =/

manics commented 6 months ago

My feeling is that breaking changes are fine as long as there's no risk of data loss. Restarting user's pods is disruptive but it's very rare that it's required by a Z2JH release. Kubernetes upgrades require all pods to restart, and Kubernetes upgrades are much more frequent!

Changes to pvc_name and namespace are obviously more problematic, perhaps we should store more information in the state similar to https://github.com/jupyterhub/dockerspawner/pull/386/commits/05f5433ee2852cc978faf6fa2930f5e48ecd70ac from https://github.com/jupyterhub/dockerspawner/pull/386 ?

consideRatio commented 6 months ago

Kubernetes upgrades require all pods to restart, and Kubernetes upgrades are much more frequent!

Not at a single point in time though! When upgrading in 2i2c clusters, i rotate out old node pools and rotate in new. K8s upgrades are almost entirely non-disruptive this way.

Changes to pvc_name and namespace are obviously more problematic, perhaps we should store more information in the state similar to https://github.com/jupyterhub/dockerspawner/commit/05f5433ee2852cc978faf6fa2930f5e48ecd70ac from https://github.com/jupyterhub/dockerspawner/pull/386 ?

Hmmm, a common situation is that the username expansion is used to declare a subPath on a NFS mount as well, so even changing how the username expands can be breaking. I've not looked at the linked PR yet, I'm on mobile atm, but I'm hesitant to try handle the complexity of a transition, but more optimistic on doing a switch where new z2jh deploys get new defaults and old deploys retains previous function unless they opt-in - including then breaking changes.

consideRatio commented 6 months ago

Ah hmm okay so we have built in functionality in Spawner base class to retain state as documented in https://jupyterhub.readthedocs.io/en/stable/reference/spawners.html#spawner-state, nice - but is it reasonable?

So we would then ask "is this a new user-server that hasn't been spawned before - then let's do X instead of Y" for each user server? Hmmm...

I think overall, this is also too complicated and fragile:

consideRatio commented 6 months ago

Current take

KubeSpawner

  1. I think the label values should match how we do the username/servername expansion
  2. I think changing the username/servername expansion is breaking things in ways that are hard to determine and test for as it can tie to users setups, and that due to this:
    • a) KubeSpawner shouldn't attempt to operate in a way that does different username/servername expansion on a per-server or per-user basis because it would become too complicated for project maintainers and users to handle well enough
    • b) KubeSpawner should provide a new robuster username/servername expansion that can be opted in to

Z2JH

  1. With a KubeSpawner opt-in available and tested a bit, z2jh should consider trying to make the behavior default for new installations - something that could probably be done somewhat robustly
yuvipanda commented 5 months ago

I left a message in https://github.com/jupyterhub/kubespawner/pull/835#issuecomment-2159422869