kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.23k stars 645 forks source link

Improve PodEvictor observability through EvictOptions #1349

Closed tioxy closed 4 months ago

tioxy commented 4 months ago

What type of PR is this?

/kind feature

What this PR does / why we need it:

Supports profile name set under the label profile for the metric pods_evicted.

This allows the possibility of alerting different severities for different profiles.

Also, some refactors related to EvictOptions to make it more standard.

Which issue(s) this PR fixes:

NONE

Special notes for your reviewer:

I really do not think ProfileName as a public function was the best idea for PodEvictor, but adding to NewPodEvictor would require changes over many lines of code. Any suggestions on how to improve it?

Already discussed in the comments.

Results for v1alpha1
Config
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "PodLifeTime":
     enabled: true
     params:
       podLifeTime:
         maxPodLifeTimeSeconds: 5
         states:
         - "Running"
         - "Pending"
         - "PodInitializing"
  "RemoveFailedPods":
     enabled: true
     params:
       failedPods:
         reasons:
         - "NodeAffinity"
         includingInitContainers: true
         excludeOwnerKinds:
         - "Job"
         minPodLifetimeSeconds: 5
Metric output
// + profile="strategy-RemoveFailedPods-profile"
descheduler_pods_evicted{namespace="local-path-storage",node="kind-control-plane",profile="strategy-RemoveFailedPods-profile",result="success",strategy="PodLifeTime"} 2
Results for v1alpha2
Config
apiVersion: descheduler/v1alpha2
kind: DeschedulerPolicy
profiles:
  - name: profile-01
    pluginConfig:
    - name: PodLifeTime
      args:
        maxPodLifeTimeSeconds: 5
        states:
        - Running
        - Pending
        - PodInitializing
    - name: RemoveFailedPods
      args:
        reasons:
        - OutOfcpu
        - CreateContainerConfigError
        includingInitContainers: true
        excludeOwnerKinds:
        - Job
        minPodLifetimeSeconds: 5
    plugins:
      deschedule:
        enabled:
          - PodLifeTime
          - RemoveFailedPods
Metric output
// + profile="profile-01"
descheduler_pods_evicted{namespace="local-path-storage",node="kind-control-plane",profile="profile-01",result="success",strategy="PodLifeTime"} 2

Does this PR introduce a user-facing change?

Metric `pods_evicted` gained `profile` label

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE
k8s-ci-robot commented 4 months ago

Hi @tioxy. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
jklaw90 commented 4 months ago

/ok-to-test

tioxy commented 4 months ago

@a7i Tried following the suggestion of using EvictOptions. Also added support for StrategyName and deprecated context references for it.

Thank you for the patience and attention! 😀 👍

a7i commented 4 months ago

really good changes and clean-up 🏆

a7i commented 4 months ago

/label tide/merge-method-squash /approve

k8s-ci-robot commented 4 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: a7i

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/descheduler/blob/master/OWNERS)~~ [a7i] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
a7i commented 4 months ago

/lgtm

Thank you for your contribution and patience! 🏆

tioxy commented 4 months ago

Just realized another change was missing here.

https://github.com/kubernetes-sigs/descheduler/pull/1358