Closed bingzheliu closed 6 months ago
Hi @bingzheliu awesome job with the Issue summary 👏🏼
We have a duplicate Issue here as well: https://github.com/kubernetes-sigs/descheduler/issues/1032
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
descheduler version
0.27.1
k8s version
v1.28.0
Problem
The TopologySpreadConstraint calculates pods-for-eviction constraint by constraint and makes local decisions, which can cause pods being mistakenly evicted. It should look into all the constraints together to make the correct decision.
In particular, In the step #2 in the
Balance()
intopologyspreadconstraint.go
, thebalanceDomains
work for each constraint.When the constraints are "conflict" with each other, the pods can be mistakenly evicted. Following is an example.
Issue example
Setup
Kubernetes scheduler:
Decheduler policy:
Topology
It's a three-node topology, with two topology keys: the zone and hostname. There are 6 replicas being placed onto these nodes according to the topologySpreadConstraints.
What can happen
For the above topology, if number-of-replicas equal to 6, the scheduler can not place the pods while respecting both zone and hostname topo constraints. However, as the topo constraints are ScheduleAnyway, the 6th-pod(shown in blue) can still be placed onto the node-hostname-1.
The descheduler will run the
RemovePodsViolatingTopologySpreadConstraint
plugin and decide to evict one pod on node-hostname-1 as it violates thehostname
constraint. Then the scheduler will schedule such 6-th pod back to node-hostname-1 again. This causes an unending cycle of scheduling and eviction.A similar problem has been reported before: issue 921
What is expected
If the descheduler can consider all the constraints together, it can conclude that there's no solution to satisfy both constraints, and it hence shouldn't evict any pods.