Azure- returning in-memory size incorrect value when spot instance is deleted

magnetic5355 commented 1 month ago

Which component are you using?:cluster-autoscaler

What version of the component are you using?: 1.31

Component version: 1.31

What k8s version are you using (kubectl version)?: 1.30.5+k3s1

kubectl version Output

$ kubectl version

What environment is this in?: Azure

What did you expect to happen?: When a VMSS spot instance is deleted and the node is removed from the cluster I expect the autoscaler to invalidate its cache

What happened instead?: Schedulable pods are present, however the in-memory size is 9 but the actual VMSS set is only 7

1 filter_out_schedulable.go:78] Schedulable pods present │ │ I1009 02:24:15.536067 1 static_autoscaler.go:557] No unschedulable pods │ │ I1009 02:24:15.536082 1 azure_scale_set.go:217] VMSS: k8-agent-2, returning in-memory size: 0 │ │ I1009 02:24:15.536093 1 azure_scale_set.go:217] VMSS: k8-agent-d2ds_v5, returning in-memory size: 9

--- eventually this will start logging in a loop when the cluster tries to scale down ----

│ I1009 02:31:59.254556 1 static_autoscaler.go:756] Decreasing size of k8-agent-d2ds_v5, expected=9 current=7 delta=-2 │ │ I1009 02:31:59.254570 1 azure_scale_set_instance_cache.go:77] invalidating instanceCache for k8-agent-d2ds_v5 │ │ I1009 02:31:59.254579 1 azure_scale_set.go:217] VMSS: k8-agent-d2ds_v5, returning in-memory size: 9 │ │ I1009 02:31:59.254594 1 static_autoscaler.go:469] Some node group target size was fixed, skipping the iteration

How to reproduce it (as minimally and precisely as possible):

Setup K3S cluster (not using AKS) Set provider ID on nodes to proper format ie aks:/// Set kubernetes.azure.com/agentpool node label Add tags to VMSS for auto scaler Increase workload to have autoscaler create new nodes. Delete a VMSS instance from Azure

In memory size never refreshes, new nodes are never created.

I have to restart the cluster-autoscaler pod to scale the cluster back up

Anything else we need to know?:

adrianmoisey commented 1 month ago

/kind cluster-autoscaler

k8s-ci-robot commented 1 month ago

@adrianmoisey: The label(s) kind/cluster-autoscaler cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/kubernetes/autoscaler/issues/7373#issuecomment-2402013573): >/kind cluster-autoscaler Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

adrianmoisey commented 1 month ago

/area cluster-autoscaler

kubernetes / autoscaler

Azure- returning in-memory size incorrect value when spot instance is deleted #7373