TopologySpreadConstraint calculation has issues that can mistakenly evict pods

bingzheliu commented 1 year ago

descheduler version

0.27.1

k8s version

v1.28.0

Problem

The TopologySpreadConstraint calculates pods-for-eviction constraint by constraint and makes local decisions, which can cause pods being mistakenly evicted. It should look into all the constraints together to make the correct decision.

In particular, In the step #2 in the Balance() intopologyspreadconstraint.go, the balanceDomains work for each constraint.

When the constraints are "conflict" with each other, the pods can be mistakenly evicted. Following is an example.

Issue example

Setup

Kubernetes scheduler:

    - maxSkew: 1 
       topologyKey: zone 
       whenUnsatisfiable: ScheduleAnyway 
       labelSelector: 
             matchLabels: 
                 foo: bar
     - maxSkew: 1 
        topologyKey: hostname 
        whenUnsatisfiable: ScheduleAnyway 
        labelSelector: 
             matchLabels: 
                 foo: bar

Decheduler policy:

profiles:
  - name: ProfileName
    pluginConfig:
    - name: "RemovePodsViolatingTopologySpreadConstraint"
      args:
        constraints:
          - ScheduleAnyway
    plugins:
      balance:
        enabled:
          - "RemovePodsViolatingTopologySpreadConstraint"

Topology example

It's a three-node topology, with two topology keys: the zone and hostname. There are 6 replicas being placed onto these nodes according to the topologySpreadConstraints.

What can happen

For the above topology, if number-of-replicas equal to 6, the scheduler can not place the pods while respecting both zone and hostname topo constraints. However, as the topo constraints are ScheduleAnyway, the 6th-pod(shown in blue) can still be placed onto the node-hostname-1.

The descheduler will run the RemovePodsViolatingTopologySpreadConstraint plugin and decide to evict one pod on node-hostname-1 as it violates the hostname constraint. Then the scheduler will schedule such 6-th pod back to node-hostname-1 again. This causes an unending cycle of scheduling and eviction.

A similar problem has been reported before: issue 921

What is expected

If the descheduler can consider all the constraints together, it can conclude that there's no solution to satisfy both constraints, and it hence shouldn't evict any pods.

a7i commented 1 year ago

Hi @bingzheliu awesome job with the Issue summary 👏🏼

We have a duplicate Issue here as well: https://github.com/kubernetes-sigs/descheduler/issues/1032

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 6 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/descheduler/issues/1219#issuecomment-2024152322): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / descheduler