Open killianmuldoon opened 2 years ago
/area topology
What would be the valid range for those fields?
We don't have these defined right now in the machine webhook (and I don't know if there's any need to), but defining a min/max is an optional part of this.
I think the main part is to ensure that we do enough validation to catch errors like #7047 on object creation, instead of during the reconcile.
Yup. The problem is that metav1.Duration
just has type "string" as OpenAPI schema, right?
If it would also use format
duration
OpenAPI would probably handle it for us? (via: // +kubebuilder:validation:Format
)
But given the recent trend we would instead of the marker implement it in the webhook. (the format godoc sounds like we should use time.ParseDuration
)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/triage accepted /remove-kind feature /kind bug
/priority important-soon
This issue is labeled with priority/important-soon
but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.
You can:
/triage accepted
(org members only)/priority important-longterm
or /priority backlog
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/triage accepted
/assign
What happens to the existing users who have persisted bad values when we update the validation here? Has it been considered to use ratcheting validation at all?
I think it was not considered
Ratcheting validation exists directly within the API server from Kube 1.30, but since we need to support older versions, ratcheting can either be implemented in a webhook, or, within a couple of well crafted CEL transition rules (though these aren't perfect as they don't cover the create case).
Without ratcheting, this does have the potential to break users on upgrade, they wouldn't be able to write anything to the object until the values of these broken fields were fixed.
Ratcheting validation exists directly within the API server from Kube 1.30
If it's enabled per default it could be okay to just wait until 1.30 is the min supported version (Cluster API v1.10, basically we could then merge in December)
NodeDeletionTimeout and NodeDrainTimeout were added to Topology managed clusters in #7098 and #6278. Currently the values of these fields are not validated on creation, and validation is instead done when the templates are turned into objects.
This lack of up-front validation lead to the unexpected failure in #7047. We could do some basic validation in the webhook on object creation to ensure these values are correctly formatted and in a given range before creation.
/kind feature