Create new labels that we can switch over to later

consideRatio commented 7 months ago

Extracted from https://github.com/jupyterhub/kubespawner/pull/835#issuecomment-2075996346:

There's an ongoing discussion about how names (including pod labels) are normalised, and whether this can be changed in a backwards compatible manner:

https://github.com/jupyterhub/kubespawner/issues/737

https://github.com/jupyterhub/kubespawner/pull/744

https://github.com/jupyterhub/kubespawner/pull/791

@minrk @yuvipanda

If we want to do the above this might be a good opportunity to set the new labels to whatever the new normalisation scheme is, and leave the old labels unchanged, avoiding some backward compatibility complexity. It won't avoid all complexity since it's not just labels that are affected.

Extracted from https://github.com/jupyterhub/kubespawner/pull/835#issuecomment-2081207827

I'm pro using this as an opportunity to fix our myriad escaping issues! Some variant of https://github.com/jupyterhub/kubespawner/pull/744 would be pretty amazing.

I don't currently have capacity to push on that though :(

@manics I think I get what you mean now with this might be a good opportunity - you mean that getting these new labels out at the same time as #835, makes us able to switchover with a single breaking change in the future with the same pre-conditions (that users have used kubespawner a while with duplicated sets of labels)?

manics commented 7 months ago

you mean that getting these new labels out at the same time as https://github.com/jupyterhub/kubespawner/pull/835, makes us able to switchover with a single breaking change in the future with the same pre-conditions (that users have used kubespawner a while with duplicated sets of labels)?

Yes, subject to discussion about what we escape and how since it may affect more than just labels. Given this could be a breaking change that affects users I think we should discuss these together, and ensure there's a coherent plan that minimises pain for admins (how do they deal with the breaking changes) and for us as developers (will we maintain multiple code paths temporarily, or indefinitely)?

For many JupyterHub PRs we can merge without resolving all issues since the impact of the PR is limited and can be easily changed, but because the issues discussed here have a big impact on JupyterHub admins I think we should have a coherent plan. It might be that the consensus is that https://github.com/jupyterhub/kubespawner/pull/835 is fine in isolation since the escaping issues will take too much time to fix! But I think it should be a deliberate decision.

consideRatio commented 6 months ago

Given this could be a breaking change that affects users I think we should discuss these together, and ensure there's a coherent plan that minimises pain for admins (how do they deal with the breaking changes) and for us as developers (will we maintain multiple code paths temporarily, or indefinitely)?

I agree on minimizing effort for users and maintainers, and if you think there is a coupling here I'll investigate further.

Tech summary

k8s label values have constraints We have and may still be breaking these constraints in kubespawner for hub.jupyter.org/username and hub.jupyter.org/servername -- the k8s api-server refuses resource creation for resources if constraints aren't met.
the k8s label=value component=singleuser-server This label=value influences what pods KubeSpawner monitors via its k8s resource reflector - removing/changing this label=value makes KubeSpawner loose track of user server pods which is a breaking change forcing all user servers to be shut down before upgrading. This is why its good to add labels alongside the old component as done in #835.
matchLabels for a k8s Deployment ... ... indicates what pods the Deployment monitors and controls. It is an immutable field forcing re-creations of the Deployment resource if its to be changed. Practically when a change to matchLabels is to be done for a Deployment resource, we would delete the Deployment resource first without deleting pods it controls via kubectl delete --cascade=orphan ..., and then re-create the Deployment resource (helm upgrade).

KubeSpawner labels to consider

hub.jupyter.org/username and hub.jupyter.org/servername aren't used by KubeSpawner itself. They are instead only of relevance to for example jupyterhub/grafana-dashboards` or similar.
component is used to decide what k8s pods to monitor via the PodReflector

KubeSpawner functions to consider

The functions to create/delete k8s pods/pvcs relies on knowledge about a pod/pvc name - not relying on label etc
The functions to create/(delete) k8s secrets (for internal ssl) also don't rely on label info, and are automatically deleted via referenced ownerReference (user server pod) being deleted)

My conclusions

Third party things maybe breaks if hub.jupyter.org/username or hub.jupyter.org/servername change Changing the values of hub.jupyter.org/username and hub.jupyter.org/servername are not breaking kubespawner, z2jh, binderhub, or jupyterhub/grafana-dashboards (https://github.com/jupyterhub/grafana-dashboards/blob/main/docs/tutorials/deploy.md#pre-requisites). Renaming them would be breaking to users of grafana-dashboards and then they would benefit from having a period of time with both old/new names available.
Additions of labels are not a problem I think adding labels alongside the old should be non-breaking for anything really.
Changing the value hub.jupyter.org/username or hub.jupyter.org/servername can be done now or later If we want to bundle the change associated with the labels, we need it bundled with the next major release of KubeSpawner/Z2JH rather than for z2jh 4 adding new labels - because that isn't breaking - only removing is.
Its nice to do any breaking change now in KubeSpawner to get it flushed out with z2jh 4

My takeaway

I think its good to address #737, #744, #791 now as doing so is breaking for software outside jupyterhub's official software referencing those labels values, and getting a breaking change to land in kubespawner ~now is great as it allows us to flush it into z2jh 4 with other breaking changes demanding users attention.

I see no reason to wait doing #737, #744, #791 until we maybe in the futuer remove component label etc in favor of app.kubernetes.io/component in z2jh 5 or later.

consideRatio commented 6 months ago

It seems #744 is making a change that would influence self.pod_name, self.dns_name, self.secret_name, self.pvc_name, self.namespace, self.working_dir. A change to pod_name requires user servers to be restarted I think, and a change to pvc_name and namespace are probably very breaking changes as they are persistent.

        if self.enable_user_namespaces:
            self.namespace = self._expand_user_properties(self.user_namespace_template)
        self.pod_name = self._expand_user_properties(self.pod_name_template)
        self.dns_name = self.dns_name_template.format(namespace=self.namespace, name=self.pod_name)
        self.secret_name = self._expand_user_properties(self.secret_name_template)
        self.pvc_name = self._expand_user_properties(self.pvc_name_template)
        if self.working_dir:
            self.working_dir = self._expand_user_properties(self.working_dir)

Overall, this makes #744 far more complicated to land than if it just touched the labels =/

manics commented 6 months ago

My feeling is that breaking changes are fine as long as there's no risk of data loss. Restarting user's pods is disruptive but it's very rare that it's required by a Z2JH release. Kubernetes upgrades require all pods to restart, and Kubernetes upgrades are much more frequent!

Changes to pvc_name and namespace are obviously more problematic, perhaps we should store more information in the state similar to https://github.com/jupyterhub/dockerspawner/pull/386/commits/05f5433ee2852cc978faf6fa2930f5e48ecd70ac from https://github.com/jupyterhub/dockerspawner/pull/386 ?

consideRatio commented 6 months ago

Kubernetes upgrades require all pods to restart, and Kubernetes upgrades are much more frequent!

Not at a single point in time though! When upgrading in 2i2c clusters, i rotate out old node pools and rotate in new. K8s upgrades are almost entirely non-disruptive this way.

Changes to pvc_name and namespace are obviously more problematic, perhaps we should store more information in the state similar to https://github.com/jupyterhub/dockerspawner/commit/05f5433ee2852cc978faf6fa2930f5e48ecd70ac from https://github.com/jupyterhub/dockerspawner/pull/386 ?

Hmmm, a common situation is that the username expansion is used to declare a subPath on a NFS mount as well, so even changing how the username expands can be breaking. I've not looked at the linked PR yet, I'm on mobile atm, but I'm hesitant to try handle the complexity of a transition, but more optimistic on doing a switch where new z2jh deploys get new defaults and old deploys retains previous function unless they opt-in - including then breaking changes.

consideRatio commented 6 months ago

Ah hmm okay so we have built in functionality in Spawner base class to retain state as documented in https://jupyterhub.readthedocs.io/en/stable/reference/spawners.html#spawner-state, nice - but is it reasonable?

So we would then ask "is this a new user-server that hasn't been spawned before - then let's do X instead of Y" for each user server? Hmmm...

I think overall, this is also too complicated and fragile:

we would have some users conditionally expanding their username in one way, and others in another way, and anything we haven't developed touching this has to handle both at the same time now
we would have to persist data in the jupyterhub database for each user/server combination, but I think it could be breaking if we for example expand username old-school style for the user's default server but not for a new named server - but we couldn't know
we would have to persist data in jupyterhubs database, we have additional state that we need to keep track on indefinitively and could cause issues

consideRatio commented 6 months ago

Current take

KubeSpawner

I think the label values should match how we do the username/servername expansion
I think changing the username/servername expansion is breaking things in ways that are hard to determine and test for as it can tie to users setups, and that due to this:
- a) KubeSpawner shouldn't attempt to operate in a way that does different username/servername expansion on a per-server or per-user basis because it would become too complicated for project maintainers and users to handle well enough
- b) KubeSpawner should provide a new robuster username/servername expansion that can be opted in to

Z2JH

With a KubeSpawner opt-in available and tested a bit, z2jh should consider trying to make the behavior default for new installations - something that could probably be done somewhat robustly

yuvipanda commented 5 months ago

I left a message in https://github.com/jupyterhub/kubespawner/pull/835#issuecomment-2159422869

jupyterhub / kubespawner