Closed neolit123 closed 6 months ago
happy to send a PR, if this is agreed on.
/area clusterclass
/triage accepted
Sounds good to me
Q: how about CluserClass, what has to be done there?
I think nothing. replicas cannot be set in a ClusterClass and it would be highly unusual to set the autoscaler annotations in a ClusterClass (as it would then apply to all Clusters that use a specific ClusterClass0.
Sounds good to me
any tips of where / how to apply the discussed mutual exclusion of annotation vs replicas field?
I think nothing. replicas cannot be set in a ClusterClass and it would be highly unusual to set the autoscaler annotations in a ClusterClass (as it would then apply to all Clusters that use a specific ClusterClass0.
ack, but it sounds like a plausible use case.
any tips of where / how to apply the discussed mutual exclusion of annotation vs replicas field?
You can add code here to iterate through newCluster.Spec.Topology.Workers.MachineDeployments
and check for the annotations/replica fields.
ack, but it sounds like a plausible use case.
If we want to consider this case we additionally need: a) in the Cluster webhook: also consider the annotations a MD will get from the ClusterClass b) in the ClusterClass webhook: when the annotations are added there, check all Clusters using the ClusterClass (and the specific MD class) if they have replicas set
a
is comparatively easy, b
a bit messy but still doable.
But what makes a
a bit more complicated is that while we have already validation today that uses the ClusterClass in the Cluster webhook, it is basically optional validation. Because we cannot assume that a ClusterClass is deployed before the Cluster and we don't want to fail in this case.
thanks for the tips. i will send a PR for the base fix and see if i can also fix A and B.
EDIT: PR is here: https://github.com/kubernetes-sigs/cluster-api/pull/10370
there was some feedback on at the CAPI zoom call on April 3rd 2024: https://docs.google.com/document/d/1GgFbaYs-H6J5HSQ6a7n4aKpk0nDLE2hgG2NSOM9YIRw/edit#bookmark=id.yrm7uk6etmkk
there was some agreement bad things can happen if we leave this without the mutex. at least in a CAPV scenario.
@elmiko promised to TAL and had some good questions - e.g. how to handle the case if a user defined replicas value is set as > of the autoscaller annotation max. in that case the validation webhook will block them. i said maybe the user should pause the validation webhook, but @fabriziopandini explained this a bad practice. perhaps they should just temp. remove the annotations, update the replica count and the add the annotations back.
the cluster autoscaler can act differently depending on the conditions of how a user modifies the replicas field:
as long as the user is not modifying the replica count so that it is outside the min/max range, then the changes are disruptive but do not put the autoscaler into an unrecoverable state.
if the user modifies the replicas to be outside the min/max range, then they will need to remediate it manually.
if we make the annotations mutually exclusive with replicas field then we will need to be very clear about the process a user needs to take when triaging a misbehaving cluster to include the possibility of needing to remove the annotations if the replicas need adjusting.
another important note, as mentioned by @vrabbi in our community meeting today. the replicas field in ClusterClass objects can have very bad interactions with the cluster autoscaler if the user tries to control both the min/max annotations and the replicas.
thanks, or the details @elmiko appreciated!
another important note, as mentioned by @vrabbi in our community meeting today. the replicas field in ClusterClass objects can have very bad interactions with the cluster autoscaler if the user tries to control both the min/max annotations and the replicas.
i don't think there are replicas fields on a CC objects, unless i'm mistaken? https://github.com/kubernetes-sigs/cluster-api/blob/main/api/v1beta1/clusterclass_types.go
i was under the impression we are discussing the MachineDeploymentTopology replicas: https://github.com/kubernetes-sigs/cluster-api/blob/main/api/v1beta1/cluster_types.go#L187
Indeed not in CC but rather in CC based clusters
To make it slightly more complicated. There is also a --enforce-node-group-min-size
flag on the cluster-autoscaler which will scale up to the min even if the current MD.spec.replicas are < min. (sorry just remembering this now)
(https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#my-cluster-is-below-minimum--above-maximum-number-of-nodes-but-ca-did-not-fix-that-why)
I think we have to focus the discussion a bit. I personally would focus in this issue on the ClusterClass case and the question "which configuration is clearly wrong" (so that we should block it in the cluster webhook). My take: (assuming min & max annotations are set)
--enforce-node-group-min-size
:
this means:
--enforce-node-group-min-size
is set or not)Hope I didn't mix up anything.
Looking at this, I think I would keep it simple: block 1, 2, 3.
I think if users look at their cluster spec it shouldn't take them a long time and in-depth knowledge of CAPI & custer-autoscaler to figure out what is actually going on. If replicas & min/max annotations are set it's hard to figure out what's happening without all the knowledge we're just gathering here.
I think if users look at their cluster spec it shouldn't take them a long time and in-depth knowledge of CAPI & custer-autoscaler to figure out what is actually going on. If replicas & min/max annotations are set it's hard to figure out what's happening without all the knowledge we're just gathering here.
agreed.
Looking at this, I think I would keep it simple: block 1, 2, 3.
agreed, but i would like to exclude checking --enforce-node-group-min-size and just consider the annotations and replica values only (i.e. just block 1 and 3 without checking the flag). i would guess the only way to fetch the flag value is by parsing the autoscaller deployment and it's not managed in an API object anywhere.
other than that, 1,2,3 don't complicate the PR too much. the code that the PR adds would be simple after that, minus fetching the autoscaller deployment.
Q: to what numerical value should a min/max annotation value be set in the PR logic if the user entered a NaN by mistake e.g. foo
for max.
agreed, but i would like to exclude checking --enforce-node-group-min-size and just consider the annotations and replica values only
Oh yup 100%. I was just writing up the options what could happen to think through the problem :)
Q: to what numerical value should a min/max annotation value be set in the PR logic if the user entered a NaN by mistake e.g. foo for max.
We should probably just validate the value and return an error if it's NaN
We should probably just validate the value and return an error if it's NaN
ok, understood. just return a standard validation error from the webhook.
@sbueringer just realized, if we are blocking on 1,2,3 doesn't that mean block on any number of replicas?
x < min || x >= min || x <= max || x > max
that's basically what the PR does right now, it just checks that there are min/max annotations, doesn't check their values and checks if replicas is non-nil.
Yup, that's what it means :)
Sorry for the confusion. Just wanted to write up all the branches and lay out what could happen in each. Just to eventually end up with that simple result
that's basically what the PR does right now, it just checks that there are min/max annotations, doesn't check their values and checks if replicas is non-nil.
in that case i would prefer to leave the annotation value validation to the autoscaller, i.e. not include number parsing of the annotations in these cluster and cc validating webhooks.
+1 to making this as simple to understand as possible. from the user experience point of view, there should be no surprises and a clear path for triaging.
validating the min/max value is probably not worth the effort since we will be duplicating logic from the autoscaler, and adds an extra layer for possible confusion.
i absolutely agree with blocking number 2, the others i'm less clear on because there are users who rely on using the replica field to make a specific node group fall outside the autoscaler's scanning. i'm not sure how widespread this pattern is, but we will break it if we disallow changing the replicas while the annotations are present.
i absolutely agree with blocking number 2, the others i'm less clear on because there are users who rely on using the replica field to make a specific node group fall outside the autoscaler's scanning. i'm not sure how widespread this pattern is, but we will break it if we disallow changing the replicas while the annotations are present.
yes, that would break them. if we have concerns that this type of usage we can try to be safe and only apply 2. but IMO there should be a clearer contract and for 1, 3 the user pattern should be to just remove the annotations.
it would be even clearer if there is an explicit enabled
annotation, but i'm not making a FR about it. :)
@fabriziopandini @chrischdi @vincepri @enxebre @killianmuldoon Any opinions on:
i absolutely agree with blocking number 2, the others i'm less clear on because there are users who rely on using the replica field to make a specific node group fall outside the autoscaler's scanning. i'm not sure how widespread this pattern is, but we will break it if we disallow changing the replicas while the annotations are present.
should be a clearer contract and for 1, 3 the user pattern should be to just remove the annotations.
i tend to agree with this notion, but in conversations with end users i have heard them mention the complexity around the annotations and replicas behavior causes toil in some scenarios. i'm not sure that we can avoid it (although an operator to help with the capi/cas deployment might be a worthwhile investigation) as there is a necessary amount of configuration that has to be done.
It still doesnt fully block them either cause they can always change the replicas without removing the annotations by doing it directly on the machinedeployment resource instead of via the cluster resource if needed
And they could also change/set replicas and remove the annotations in Cluster.spec.topology. If the goal is to set replicas to a value where the autoscaler is not active they could do that. They just couldn't do it by only setting replicas to a value outside of (min,max) while keeping the annotations.
So I guess it's slightly more work for the user, but the result is more explicit (annotations are removed => so it's 100% clear that autoscaler is not active for everyone)
So I guess it's slightly more work for the user, but the result is more explicit (annotations are removed => so it's 100% clear that autoscaler is not active for everyone)
agreed, this is my preference too. i just wanted to share some of the feedback i've heard.
I would also prefer to be as explicit as possible. We could take care that the returned errors which are surfaced to the user are as easy to understand as possible.
Less permutations, less questions and easier to understand.
+1 to block when the user sets replicas and the autoscaler annotation exists + the user has to explicitly remove the annotations to set replicas, no matter if replicas are within the min/max range or not
there seems to be consensus about the plan. PTAL at the implementation PR: https://github.com/kubernetes-sigs/cluster-api/pull/10370
/priority important-soon
What would you like to be added (User Story)?
as a user, i would like to have a single active method of scaling the workloads of a cluster. as a developer, i would like to sandbox the user so that they don't try to scale with autoscaler and by manually editing the replicas field at the same time.
Detailed Description
manually scaling workers by editing the replicas field in a Class-based cluster with cluster-autoscaler enabled can break cluster-autoscaler. as seen downstream with CAPV, in such a scenario cluster-autoscale can start creating new worker machines but then, these machines will get deleted immediately.
eventually this can cause datastore and VM cleanup problems.
some more quotes from the report:
proposal
make the autoscaler annotations:
present on Cluster object in:
cluster.Spec.Topology.Workers.MachineDeployments[x].Metadata.Annotations
mutually exclusive withcluster.Spec.Topology.Workers.MachineDeployments[x].Replicas
.Q: how about CluserClass, what has to be done there?
Anything else you would like to add?
No response
Label(s) to be applied
/kind feature /area clusterclass