Open jtv8 opened 7 years ago
This is as designed.
--spare-agents
instructs the AS to not scale-in under the specified number.
--over-provision
instructs the AS to scale out an additional number of nodes when scaling is necessary (for example if you know that when your loads starts picking up, it grows very fast).
For example, I know of some other people using this autoscaler that are setting --spare-agent
to 20
, But don't want the AS to scale to 20 when the number of VMs is under this number. Say an admin deleted most of the VMs because he knows there won't be any load during the week-end, but load might be highly unpredictable during the week and thus doesn't want the AS to make any decision by itself under 20 nodes.
I guess a good solution would be to add another parameter such as --force-spare
to instruct the AS to make sure the cluster is always at least --spare-agent
's size.
For now a very dirty solution for you is to manually create as many pending pods as needed to force the AS to scale up to --spare-agents
. The AS will then never go under this number unless you manually remove some VMs.
I've ran into strange behavior that seems connected to this issue, when scaling in from 5 nodes with --spare-agents=2 I've had 2 nodes deleted at the same scaling loop, causing the cluster to stay with 1 node. I'm attaching the logs from the scaling event that caused this - https://gist.github.com/yaron-idan/be4784fb4a874331bd9b4850cb6eeac8
This happened when using "orchestratorRelease": "1.8"
in the acs-engine config, is this version compatible and tested against the autoscaler?
At present, if the user supplies either the
--spare-agents
or--over-provision
parameters, the autoscaler does not provision the requested nodes unless there is at least one pending pod.This is important, as a cluster admin may choose to use these parameters as overrides to cover scenarios that the scaler does not know about - for example, if the admin knows that an application requires a minimum number of agents due to anti-affinity rules (see https://github.com/wbuchwalter/Kubernetes-acs-engine-autoscaler/issues/65).
Possible cause: this appears to be because this logic is only processed as part of the
fulfill_pending
method inautoscaler/scaler.py
, which only gets run when the set of pending pods is non-empty.