Open consideRatio opened 7 months ago
you mean that getting these new labels out at the same time as https://github.com/jupyterhub/kubespawner/pull/835, makes us able to switchover with a single breaking change in the future with the same pre-conditions (that users have used kubespawner a while with duplicated sets of labels)?
Yes, subject to discussion about what we escape and how since it may affect more than just labels. Given this could be a breaking change that affects users I think we should discuss these together, and ensure there's a coherent plan that minimises pain for admins (how do they deal with the breaking changes) and for us as developers (will we maintain multiple code paths temporarily, or indefinitely)?
For many JupyterHub PRs we can merge without resolving all issues since the impact of the PR is limited and can be easily changed, but because the issues discussed here have a big impact on JupyterHub admins I think we should have a coherent plan. It might be that the consensus is that https://github.com/jupyterhub/kubespawner/pull/835 is fine in isolation since the escaping issues will take too much time to fix! But I think it should be a deliberate decision.
Given this could be a breaking change that affects users I think we should discuss these together, and ensure there's a coherent plan that minimises pain for admins (how do they deal with the breaking changes) and for us as developers (will we maintain multiple code paths temporarily, or indefinitely)?
I agree on minimizing effort for users and maintainers, and if you think there is a coupling here I'll investigate further.
hub.jupyter.org/username
and hub.jupyter.org/servername
-- the k8s api-server refuses resource creation for resources if constraints aren't met.component=singleuser-server
This label=value influences what pods KubeSpawner monitors via its k8s resource reflector - removing/changing this label=value makes KubeSpawner loose track of user server pods which is a breaking change forcing all user servers to be shut down before upgrading. This is why its good to add labels alongside the old component
as done in #835.matchLabels
for a k8s Deployment ...
... indicates what pods the Deployment monitors and controls. It is an immutable field forcing re-creations of the Deployment resource if its to be changed. Practically when a change to matchLabels
is to be done for a Deployment
resource, we would delete the Deployment resource first without deleting pods it controls via kubectl delete --cascade=orphan ...
, and then re-create the Deployment resource (helm upgrade
).hub.jupyter.org/username
and hub.jupyter.org/servername
aren't used by KubeSpawner itself. They are instead only of relevance to for example jupyterhub/grafana-dashboards` or similar.component
is used to decide what k8s pods to monitor via the PodReflectorownerReference
(user server pod) being deleted)hub.jupyter.org/username
and hub.jupyter.org/servername
are not breaking kubespawner, z2jh, binderhub, or jupyterhub/grafana-dashboards (https://github.com/jupyterhub/grafana-dashboards/blob/main/docs/tutorials/deploy.md#pre-requisites). Renaming them would be breaking to users of grafana-dashboards and then they would benefit from having a period of time with both old/new names available.hub.jupyter.org/username
or hub.jupyter.org/servername
can be done now or later
If we want to bundle the change associated with the labels, we need it bundled with the next major release of KubeSpawner/Z2JH rather than for z2jh 4 adding new labels - because that isn't breaking - only removing is.I think its good to address #737, #744, #791 now as doing so is breaking for software outside jupyterhub's official software referencing those labels values, and getting a breaking change to land in kubespawner ~now is great as it allows us to flush it into z2jh 4 with other breaking changes demanding users attention.
I see no reason to wait doing #737, #744, #791 until we maybe in the futuer remove component
label etc in favor of app.kubernetes.io/component
in z2jh 5 or later.
It seems #744 is making a change that would influence self.pod_name
, self.dns_name
, self.secret_name
, self.pvc_name
, self.namespace
, self.working_dir
. A change to pod_name requires user servers to be restarted I think, and a change to pvc_name
and namespace
are probably very breaking changes as they are persistent.
if self.enable_user_namespaces:
self.namespace = self._expand_user_properties(self.user_namespace_template)
self.pod_name = self._expand_user_properties(self.pod_name_template)
self.dns_name = self.dns_name_template.format(namespace=self.namespace, name=self.pod_name)
self.secret_name = self._expand_user_properties(self.secret_name_template)
self.pvc_name = self._expand_user_properties(self.pvc_name_template)
if self.working_dir:
self.working_dir = self._expand_user_properties(self.working_dir)
Overall, this makes #744 far more complicated to land than if it just touched the labels =/
My feeling is that breaking changes are fine as long as there's no risk of data loss. Restarting user's pods is disruptive but it's very rare that it's required by a Z2JH release. Kubernetes upgrades require all pods to restart, and Kubernetes upgrades are much more frequent!
Changes to pvc_name and namespace are obviously more problematic, perhaps we should store more information in the state similar to https://github.com/jupyterhub/dockerspawner/pull/386/commits/05f5433ee2852cc978faf6fa2930f5e48ecd70ac from https://github.com/jupyterhub/dockerspawner/pull/386 ?
Kubernetes upgrades require all pods to restart, and Kubernetes upgrades are much more frequent!
Not at a single point in time though! When upgrading in 2i2c clusters, i rotate out old node pools and rotate in new. K8s upgrades are almost entirely non-disruptive this way.
Changes to pvc_name and namespace are obviously more problematic, perhaps we should store more information in the state similar to https://github.com/jupyterhub/dockerspawner/commit/05f5433ee2852cc978faf6fa2930f5e48ecd70ac from https://github.com/jupyterhub/dockerspawner/pull/386 ?
Hmmm, a common situation is that the username expansion is used to declare a subPath on a NFS mount as well, so even changing how the username expands can be breaking. I've not looked at the linked PR yet, I'm on mobile atm, but I'm hesitant to try handle the complexity of a transition, but more optimistic on doing a switch where new z2jh deploys get new defaults and old deploys retains previous function unless they opt-in - including then breaking changes.
Ah hmm okay so we have built in functionality in Spawner base class to retain state as documented in https://jupyterhub.readthedocs.io/en/stable/reference/spawners.html#spawner-state, nice - but is it reasonable?
So we would then ask "is this a new user-server that hasn't been spawned before - then let's do X instead of Y" for each user server? Hmmm...
I think overall, this is also too complicated and fragile:
I left a message in https://github.com/jupyterhub/kubespawner/pull/835#issuecomment-2159422869
Extracted from https://github.com/jupyterhub/kubespawner/pull/835#issuecomment-2075996346:
Extracted from https://github.com/jupyterhub/kubespawner/pull/835#issuecomment-2081207827
@manics I think I get what you mean now with
this might be a good opportunity
- you mean that getting these new labels out at the same time as #835, makes us able to switchover with a single breaking change in the future with the same pre-conditions (that users have used kubespawner a while with duplicated sets of labels)?