Open cloudcarver opened 1 year ago
/area provider/aws
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Whats the value that should be used for the k8s.io/cluster-autoscaler/node-template/resources/vpc.amazonaws.com/pod-eni
tag?
The number of Network Interfaces the instance can have? Or number of Network Interfaces -1? Since one of the ENIs would be used as the trunk ENI
Whats the value that should be used for the
k8s.io/cluster-autoscaler/node-template/resources/vpc.amazonaws.com/pod-eni
tag? The number of Network Interfaces the instance can have? Or number of Network Interfaces -1? Since one of the ENIs would be used as the trunk ENI
Currently, you can put any value in it to start a node from 0. Then the autoscaler will make it correct as it is a dynamic value for measuring the current capacity of a node.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Which component are you using?:
cluster-autoscaler
cloud provider: AWSWhat version of the component are you using?:
[AWS VPC CNI]:
v1.12.6-eksbuild.2
[Cluster Autoscaler]:registry.k8s.io/autoscaling/cluster-autoscaler:v1.26.2
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
AWS EKS ap-southeast-1
What did you expect to happen?:
Before this bug is found, I can successfully create pods with security groups without any problems. Everything worked pretty well and the feature is also shipped to the production environment.
Then I tried to create a new node group and use affinity to try to schedule some workloads to this node group. The pods of the workload are pending forever.
What happened instead?:
I got events like the following:
How to reproduce it (as minimally and precisely as possible):
Create a special node group and use affinity to make sure a special workload will only be scheduled to this node group. Then when tryting to create pods of this workload, the autoscaler will try to scale up the corresponding node group from 0. The predicate check of the node group failed in "simulator based scheduler" because the converted
NodeInfo
of the corresponding ASG does not containvpc.amazonaws.com/pod-eni
field in itscapacity
.Anything else we need to know?:
This is FIXED by adding a tag to all ASGs:
But clearly there should be a better way 🤔