Autoscaler pod should scale the ASGs as required, even when a sagemaker hyperpod cluster is attached.
ASG scaling
What happened instead?:
no autoscaling takes place, and i can see these errors in the logs
E1128 22:19:38.487185 1 static_autoscaler.go:387] Failed to get node infos for groups: wrong id: expected format aws:///<zone>/<name>, got aws:///usw1-az3/sagemaker/cluster/hyperpod-4lluwz86unnw-i-0059feb421f5b03ed
I1128 22:19:48.487306 1 static_autoscaler.go:306] Starting main loop
E1128 22:19:48.488169 1 static_autoscaler.go:387] Failed to get node infos for groups: wrong id: expected format aws:///<zone>/<name>, got aws:///usw1-az3/sagemaker/cluster/hyperpod-4lluwz86unnw-i-00f55e8b5be774bd7
I1128 22:19:58.489198 1 static_autoscaler.go:306] Starting main loop
E1128 22:19:58.490443 1 static_autoscaler.go:387] Failed to get node infos for groups: wrong id: expected format aws:///<zone>/<name>, got aws:///usw1-az3/sagemaker/cluster/hyperpod-4lluwz86unnw-i-0a42a231e263cac0c
I1128 22:20:08.491624 1 static_autoscaler.go:306] Starting main loop
E1128 22:20:08.492629 1 static_autoscaler.go:387] Failed to get node infos for groups: wrong id: expected format aws:///<zone>/<name>, got aws:///usw1-az3/sagemaker/cluster/hyperpod-4lluwz86unnw-i-0059feb421f5b03ed
I1128 22:20:18.493453 1 static_autoscaler.go:306] Starting main loop
E1128 22:20:18.494944 1 static_autoscaler.go:387] Failed to get node infos for groups: wrong id: expected format aws:///<zone>/<name>, got aws:///usw1-az3/sagemaker/cluster/hyperpod-4lluwz86unnw-i-0cfbeeb3654698d80
How to reproduce it (as minimally and precisely as possible):
Setup cloud autoscaler with auto discover. The discovery and autoscaling works well
Add an AWS sagemaker hyperpod eks cluster
This will cause these error logs and no autoscaling
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
Component version: 1.30
Component version:
What k8s version are you using (
kubectl version
)?:$ Client Version: v1.30.0 $ Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 $ Server Version: v1.30.6-eks-7f9249a
What environment is this in?:
AWS/EKS
What did you expect to happen?:
Autoscaler pod should scale the ASGs as required, even when a sagemaker hyperpod cluster is attached.
ASG scaling
What happened instead?:
no autoscaling takes place, and i can see these errors in the logs
How to reproduce it (as minimally and precisely as possible):
Setup cloud autoscaler with auto discover. The discovery and autoscaling works well Add an AWS sagemaker hyperpod eks cluster This will cause these error logs and no autoscaling
Anything else we need to know?: