kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.84k stars 3.88k forks source link

Couldn't find template for node group #6876

Open duviful opened 1 month ago

duviful commented 1 month ago

Which component are you using?: cluster-autoscaler

What version of the component are you using?:

Component version: 1.30.0

What k8s version are you using (kubectl version)?:

kubectl version Output
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.3

What environment is this in?: cluster-api-vsphere

What did you expect to happen?:

Autoscaler pods is able to trigger scaling

What happened instead?:

This error is repeated multiple time in the cluster-autoscaler pods logs:

[static_autoscaler.go:1036] Couldn't find template for node group MachineDeployment/default/workload-cluster-1-md-0

How to reproduce it (as minimally and precisely as possible):

Define workload cluster using cluster-api in an existing management cluster, all hosted on vSphere. The workload cluster is deployed correctly, it scales up and down using 'kubectl scale machinedeployment workload-cluster-1-md-0 --replicas x' The autoscaler is then deployed using the helm chart as a base, with a kustomization to address cloud-specific resources permissions, as mentioned in #5509

Anything else we need to know?:

kundan2707 commented 1 month ago

You must configure node group auto discovery to inform cluster autoscaler which cluster in which to find for scalable node groups. Users of single-arch non-amd64 clusters who are using scale from zero support should also set the CAPI_SCALE_ZERO_DEFAULT_ARCH environment variable to set the architecture of the nodes they want to default the node group templates to. The autoscaler will default to amd64 if it is not set, and the node group templates may not match the nodes’ architecture,

kundan2707 commented 1 month ago

/remove-kind bug

kundan2707 commented 1 month ago

/kind support

duviful commented 1 month ago

Thank you for your response.

Node group auto-discovery was already defined in the pod's command. I'll post you an extract.

│    clusterapi-cluster-autoscaler:                                                                                                                                                            │
│     Image:      registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.0                                                                                                                       │
│     Port:       8085/TCP                                                                                                                                                                     │
│     Host Port:  0/TCP                                                                                                                                                                        │
│     Command:                                                                                                                                                                                 │
│       ./cluster-autoscaler                                                                                                                                                                   │
│       --cloud-provider=clusterapi                                                                                                                                                            │
│       --namespace=cluster-autoscaler-system                                                                                                                                                  │
│       --node-group-auto-discovery=clusterapi:clusterName=workload-cluster-1                                                                                                                  │
│       --logtostderr=false                                                                                                                                                                    │
│       --stderrthreshold=info                                                                                                                                                                 │
│       --v=1  

I won't use CAPI_SCALE_ZERO_DEFAULT_ARCH because the CPU architecture is a standard amd64.

adrianmoisey commented 2 weeks ago

/area cluster-autoscaler