kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.23k stars 645 forks source link

EnableFullEviction for RemovePodsViolatingNodeAffinity #1363

Open jackfrancis opened 3 months ago

jackfrancis commented 3 months ago

This PR adds a EnableFullEviction configuration option to the RemovePodsViolatingNodeAffinity plugin. Enabling it would look like this:

apiVersion: v1
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha2"
    kind: "DeschedulerPolicy"
    profiles:
      - name: ProfileName
        pluginConfig:
        - name: "RemovePodsViolatingNodeAffinity"
          args:
            nodeAffinityType:
            - "requiredDuringSchedulingIgnoredDuringExecution"
            enableFullEviction: true
        plugins:
          deschedule:
            enabled:
              - "RemovePodsViolatingNodeAffinity"

The purpose of this feature is to enable eviction of all pod replicas whose declared nodeAffinity configuration no longer matches the node they are currently scheduled onto, even if there is no other node in the cluster that has a nodeAffinity match. That last part (in italics) is the change that I'm advocating for here.

TODO: docs and updated helm chart

k8s-ci-robot commented 3 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign ingvagabund for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubernetes-sigs/descheduler/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
jackfrancis commented 3 months ago

cc @nojnhuh @nawazkh

jackfrancis commented 3 months ago

Here's a quick overview of this feature in action (built an image from this branch and smoke-tested it on a Cluster API CAPZ cluster in Azure).

A watch stream of pod replicas that have a "foo=bar" nodeAffinity (requiredDuringSchedulingIgnoredDuringExecution) nodeSelector match config:

$ k get pods -l run=php-apache -o wide -w
NAME                          READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
php-apache-7674886bb6-89mxt   0/1     Pending   0          10m   <none>   <none>   <none>           <none>
php-apache-7674886bb6-cjfpb   0/1     Pending   0          10m   <none>   <none>   <none>           <none>
php-apache-7674886bb6-x5b54   0/1     Pending   0          10m   <none>   <none>   <none>           <none>
php-apache-7674886bb6-89mxt   0/1     Pending   0          11m   <none>   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-x5b54   0/1     Pending   0          11m   <none>   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-cjfpb   0/1     Pending   0          11m   <none>   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-89mxt   0/1     ContainerCreating   0          11m   <none>   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-cjfpb   0/1     ContainerCreating   0          11m   <none>   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-x5b54   0/1     ContainerCreating   0          11m   <none>   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-89mxt   0/1     ContainerCreating   0          11m   <none>   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-cjfpb   0/1     ContainerCreating   0          11m   <none>   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-x5b54   0/1     ContainerCreating   0          11m   <none>   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-x5b54   1/1     Running             0          11m   192.168.91.136   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-89mxt   1/1     Running             0          11m   192.168.91.134   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-cjfpb   1/1     Running             0          11m   192.168.91.135   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-89mxt   1/1     Running             0          11m   192.168.91.134   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-89mxt   1/1     Terminating         0          11m   192.168.91.134   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-52dwx   0/1     Pending             0          0s    <none>           <none>                            <none>           <none>
php-apache-7674886bb6-x5b54   1/1     Running             0          11m   192.168.91.136   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-89mxt   1/1     Terminating         0          11m   192.168.91.134   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-52dwx   0/1     Pending             0          0s    <none>           capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-x5b54   1/1     Terminating         0          11m   192.168.91.136   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-slknb   0/1     Pending             0          0s    <none>           <none>                            <none>           <none>
php-apache-7674886bb6-x5b54   1/1     Terminating         0          11m   192.168.91.136   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-cjfpb   1/1     Running             0          11m   192.168.91.135   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-slknb   0/1     Pending             0          0s    <none>           capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-52dwx   0/1     ContainerCreating   0          0s    <none>           capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-cjfpb   1/1     Terminating         0          11m   192.168.91.135   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-slknb   0/1     ContainerCreating   0          0s    <none>           capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-8fdtg   0/1     Pending             0          0s    <none>           <none>                            <none>           <none>
php-apache-7674886bb6-cjfpb   1/1     Terminating         0          11m   192.168.91.135   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-8fdtg   0/1     Pending             0          0s    <none>           capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-8fdtg   0/1     ContainerCreating   0          0s    <none>           capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-89mxt   1/1     Terminating         0          11m   192.168.91.134   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-x5b54   1/1     Terminating         0          11m   192.168.91.136   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-cjfpb   1/1     Terminating         0          11m   192.168.91.135   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-89mxt   0/1     Terminating         0          11m   <none>           capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-x5b54   0/1     Terminating         0          11m   <none>           capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-slknb   0/1     ContainerCreating   0          0s    <none>           capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-cjfpb   0/1     Terminating         0          11m   <none>           capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-8fdtg   0/1     ContainerCreating   0          0s    <none>           capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-52dwx   0/1     ContainerCreating   0          1s    <none>           capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-89mxt   0/1     Terminating         0          11m   192.168.91.134   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-89mxt   0/1     Terminating         0          11m   192.168.91.134   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-89mxt   0/1     Terminating         0          11m   192.168.91.134   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-cjfpb   0/1     Terminating         0          11m   192.168.91.135   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-cjfpb   0/1     Terminating         0          11m   192.168.91.135   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-cjfpb   0/1     Terminating         0          11m   192.168.91.135   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-8fdtg   1/1     Running             0          1s    192.168.54.134   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-x5b54   0/1     Terminating         0          11m   192.168.91.136   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-x5b54   0/1     Terminating         0          11m   192.168.91.136   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-x5b54   0/1     Terminating         0          11m   192.168.91.136   capz-e2e-rqyahs-vmss-mp-0000002   <none>           <none>
php-apache-7674886bb6-slknb   1/1     Running             0          2s    192.168.211.71   capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-52dwx   1/1     Running             0          2s    192.168.54.135   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-slknb   1/1     Running             0          20s   192.168.211.71   capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-slknb   1/1     Terminating         0          20s   192.168.211.71   capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-cpzpz   0/1     Pending             0          0s    <none>           <none>                            <none>           <none>
php-apache-7674886bb6-52dwx   1/1     Running             0          20s   192.168.54.135   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-cpzpz   0/1     Pending             0          0s    <none>           <none>                            <none>           <none>
php-apache-7674886bb6-slknb   1/1     Terminating         0          20s   192.168.211.71   capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-52dwx   1/1     Terminating         0          20s   192.168.54.135   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-p27jj   0/1     Pending             0          0s    <none>           <none>                            <none>           <none>
php-apache-7674886bb6-8fdtg   1/1     Running             0          20s   192.168.54.134   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-p27jj   0/1     Pending             0          0s    <none>           <none>                            <none>           <none>
php-apache-7674886bb6-52dwx   1/1     Terminating         0          20s   192.168.54.135   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-8fdtg   1/1     Terminating         0          20s   192.168.54.134   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-h274v   0/1     Pending             0          0s    <none>           <none>                            <none>           <none>
php-apache-7674886bb6-8fdtg   1/1     Terminating         0          20s   192.168.54.134   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-h274v   0/1     Pending             0          0s    <none>           <none>                            <none>           <none>
php-apache-7674886bb6-8fdtg   1/1     Terminating         0          20s   192.168.54.134   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-52dwx   1/1     Terminating         0          20s   192.168.54.135   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-8fdtg   0/1     Terminating         0          20s   <none>           capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-52dwx   0/1     Terminating         0          20s   <none>           capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-52dwx   0/1     Terminating         0          21s   192.168.54.135   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-52dwx   0/1     Terminating         0          21s   192.168.54.135   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-52dwx   0/1     Terminating         0          21s   192.168.54.135   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-8fdtg   0/1     Terminating         0          21s   192.168.54.134   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-8fdtg   0/1     Terminating         0          21s   192.168.54.134   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-8fdtg   0/1     Terminating         0          21s   192.168.54.134   capz-e2e-rqyahs-vmss-mp-0000004   <none>           <none>
php-apache-7674886bb6-slknb   1/1     Terminating         0          21s   192.168.211.71   capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-slknb   0/1     Terminating         0          21s   <none>           capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-slknb   0/1     Terminating         0          22s   192.168.211.71   capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-slknb   0/1     Terminating         0          22s   192.168.211.71   capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>
php-apache-7674886bb6-slknb   0/1     Terminating         0          22s   192.168.211.71   capz-e2e-rqyahs-vmss-mp-0000003   <none>           <none>

What the above watch stream shows is the result of doing the following on the cluster:

$ k label nodes capz-e2e-rqyahs-vmss-mp-0000002 foo=bar
node/capz-e2e-rqyahs-vmss-mp-0000002 labeled
$ k label nodes capz-e2e-rqyahs-vmss-mp-0000003 foo=bar
node/capz-e2e-rqyahs-vmss-mp-0000003 labeled
$ k label nodes capz-e2e-rqyahs-vmss-mp-0000004 foo=bar
node/capz-e2e-rqyahs-vmss-mp-0000004 labeled
$ k label nodes capz-e2e-rqyahs-vmss-mp-0000002 foo-
node/capz-e2e-rqyahs-vmss-mp-0000002 unlabeled
$ k label nodes capz-e2e-rqyahs-vmss-mp-0000003 foo-
node/capz-e2e-rqyahs-vmss-mp-0000003 unlabeled
$ k label nodes capz-e2e-rqyahs-vmss-mp-0000004 foo-

To explain what we see:

  1. The pod replicas are initially all scheduled to node 002
  2. After we remove the foo label from node 002, the pods are evicted by descheduler and distributed among 003 and 004, both of which have the foo=bar label at this point
  3. After we remove the foo label from nodes 003 and nodes 004, the pods are evicted by descheduler, and all pods are in a pending state:
$ k get pods -l run=php-apache -o wide
NAME                          READY   STATUS    RESTARTS   AGE     IP       NODE     NOMINATED NODE   READINESS GATES
php-apache-7674886bb6-cpzpz   0/1     Pending   0          6m15s   <none>   <none>   <none>           <none>
php-apache-7674886bb6-h274v   0/1     Pending   0          6m15s   <none>   <none>   <none>           <none>
php-apache-7674886bb6-p27jj   0/1     Pending   0          6m15s   <none>   <none>   <none>           <none>

The outcome from step 3 above is a new behavior that the enableFullEviction config produces.

jackfrancis commented 3 months ago

@damemi thanks for the detailed feedback. I actually agree that this would be ideal in a more general area, I'll scaffold that up to see how it looks.

Here's my use-case: in a multi-cluster environment, I'd like to be able to leverage descheduler as a trigger to indicate when a workload no longer has any suitable node to run on according to its requirements (e.g., nodeSelector, taints). More specifically:

And so, the current default descheduler behavior can prevent the above for a simple scenario where there are a small number of pod replicas able to fit onto a single node.

  1. node attributes change over time
  2. descheduler notices that pod scheduling criteria is no longer met on already scheduled pods running on nodes whose attributes are changing
  3. pods move to other nodes that do meet that criteria
  4. when no nodes on the cluster meet the criteria at all do to continued cluster node entropy, currently Running pods continue Running indefinitely as descheduler is no longer allowed to do anything

So from a high level, I'd like to be able to leverage descheduler to definitely signal (via all pods in a non-Running state) that a cluster no longer has any suitable nodes to run my workload, so that I can move those workloads to another cluster.

ingvagabund commented 3 months ago

+1 for making the configuration part of NodeFit to e.g disable the check. Disabling the check might be translated into "refresh as many pods as you can ignoring whether they get re-scheduled to any node". We have various limits on the number of evictions to configure that can be tuned to increase the impact.

Another option is to turn the NodeFit into a plugin and disable/enable the plugin as needed. Providing the requested functionality for free. Anyone can then build a custom NodeFit plugin and define use case specific policies.

k8s-ci-robot commented 3 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 1 month ago

@jackfrancis: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-descheduler-test-e2e-k8s-master-1-30 3a750b8dd9cc85b87c579f06ad65b30919052894 link true /test pull-descheduler-test-e2e-k8s-master-1-30

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).