Closed liorfranko closed 1 year ago
Update:
It's not related to the priority expander.
Even when I remove the --expander=priority
flag, and perform the same test I see the same behaviour:
I0710 13:51:38.598228 1 static_autoscaler.go:229] Starting main loop
I0710 13:51:38.602935 1 filter_out_schedulable.go:65] Filtering out schedulables
I0710 13:51:38.602956 1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
I0710 13:51:38.604265 1 filter_out_schedulable.go:170] 9 pods were kept as unschedulable based on caching
I0710 13:51:38.604279 1 filter_out_schedulable.go:171] 0 pods marked as unschedulable can be scheduled.
I0710 13:51:38.604291 1 filter_out_schedulable.go:82] No schedulable pods
I0710 13:51:38.604308 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-nzwll is unschedulable
I0710 13:51:38.604312 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-l2sx8 is unschedulable
I0710 13:51:38.604315 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-w7zk2 is unschedulable
I0710 13:51:38.604319 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-mfcdg is unschedulable
I0710 13:51:38.604322 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-n8976 is unschedulable
I0710 13:51:38.604325 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-jjcrw is unschedulable
I0710 13:51:38.604328 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-p7zqj is unschedulable
I0710 13:51:38.604331 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-k69d8 is unschedulable
I0710 13:51:38.604334 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-qm9nv is unschedulable
I0710 13:51:38.604337 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-24s9n is unschedulable
I0710 13:51:38.604439 1 scale_up.go:364] Upcoming 0 nodes
.
.
.
I0710 13:51:38.609496 1 scale_up.go:456] Best option to resize: general-dev-devops-apps-4vcpu-16gb-Ec2Spot-MultiAZ20220707191748241500000001
I0710 13:51:38.609501 1 scale_up.go:460] Estimated 10 nodes needed in general-dev-devops-apps-4vcpu-16gb-Ec2Spot-MultiAZ20220707191748241500000001
I0710 13:51:38.609705 1 scale_up.go:574] Final scale-up plan: [{general-dev-devops-apps-4vcpu-16gb-Ec2Spot-MultiAZ20220707191748241500000001 3->13 (max: 20)}]
I0710 13:51:38.609730 1 scale_up.go:663] Scale-up: setting group general-dev-devops-apps-4vcpu-16gb-Ec2Spot-MultiAZ20220707191748241500000001 size to 13
I0710 13:51:38.609752 1 auto_scaling_groups.go:219] Setting asg general-dev-devops-apps-4vcpu-16gb-Ec2Spot-MultiAZ20220707191748241500000001 size to 13
.
.
.
I0710 13:51:48.763615 1 static_autoscaler.go:229] Starting main loop
I0710 13:51:48.770761 1 filter_out_schedulable.go:65] Filtering out schedulables
I0710 13:51:48.770807 1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
I0710 13:51:48.771022 1 filter_out_schedulable.go:157] Pod devops-apps-01.sleep-87f4c7f8c-n8976 marked as unschedulable can be scheduled on node template-node-for-general-dev-devops-apps-4vcpu-16gb-Ec2Spot-MultiAZ20220707191748241500000001-7247793932385268851-1. Ignoring in scale up.
I0710 13:51:48.772494 1 filter_out_schedulable.go:170] 8 pods were kept as unschedulable based on caching
I0710 13:51:48.772514 1 filter_out_schedulable.go:171] 1 pods marked as unschedulable can be scheduled.
I0710 13:51:48.772525 1 filter_out_schedulable.go:79] Schedulable pods present
I0710 13:51:48.772541 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-jjcrw is unschedulable
I0710 13:51:48.772548 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-nzwll is unschedulable
I0710 13:51:48.772551 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-l2sx8 is unschedulable
I0710 13:51:48.772554 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-w7zk2 is unschedulable
I0710 13:51:48.772557 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-mfcdg is unschedulable
I0710 13:51:48.772561 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-24s9n is unschedulable
I0710 13:51:48.772564 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-p7zqj is unschedulable
I0710 13:51:48.772568 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-k69d8 is unschedulable
I0710 13:51:48.772571 1 klogx.go:86] Pod devops-apps-01/sleep-87f4c7f8c-qm9nv is unschedulable
I0710 13:51:48.772690 1 scale_up.go:364] Upcoming 10 nodes
.
.
.
I0710 13:51:48.776395 1 scale_up.go:456] Best option to resize: general-dev-devops-apps-4vcpu-16gb-OnDemand-MultiAZ20220710050104303500000003
I0710 13:51:48.776407 1 scale_up.go:460] Estimated 9 nodes needed in general-dev-devops-apps-4vcpu-16gb-OnDemand-MultiAZ20220710050104303500000003
I0710 13:51:48.776575 1 scale_up.go:574] Final scale-up plan: [{general-dev-devops-apps-4vcpu-16gb-OnDemand-MultiAZ20220710050104303500000003 1->10 (max: 120)}]
I0710 13:51:48.776591 1 scale_up.go:663] Scale-up: setting group general-dev-devops-apps-4vcpu-16gb-OnDemand-MultiAZ20220710050104303500000003 size to 10
I0710 13:51:48.776607 1 auto_scaling_groups.go:219] Setting asg general-dev-devops-apps-4vcpu-16gb-OnDemand-MultiAZ20220710050104303500000003 size to 10
It looks like it's related to https://github.com/kubernetes/autoscaler/issues/4082 Is it possible to add this commit to the next 1.19 version? https://github.com/kubernetes/autoscaler/pull/3883
I've created a PR to add #3883 to 1.2.x I've tested it and it works. https://github.com/kubernetes/autoscaler/pull/5015
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
Which component are you using?: cluster-autoscaler with the Helm Chart
What version of the component are you using?:
Component version: v1.20.3
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?: AWS EKS using Managed Node Groups
What did you expect to happen?:
With this configuration for priority expander:
And these flags:
What happened instead?:
The first cycle works as expected:
But on the next cycle, 10 seconds later, the expander is triggered again and now the OnDemand instances are chosen:
I tried switching the priorities to be like that:
And got the same results.
I disabled and enabled
balance-similar-node-groups
and got the same results.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: