kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.23k stars 645 forks source link

Descheduler not evicting anything when deployed with Helm chart as a deployment #1402

Closed sarg3nt closed 1 month ago

sarg3nt commented 1 month ago

What version of descheduler are you using?

descheduler version: 0.29.0

Does this issue reproduce with the latest release?

This is the latest release

Which descheduler CLI options are you using?

The ones provided by the Helm chart:

   containers:  
   - args:
     - --policy-config-file=/policy-dir/policy.yaml 
     - --descheduling-interval=2m 
     - --v=3 
     - --leader-elect=true
     command: 
     - /bin/descheduler     

Please provide a copy of your descheduler policy config file

apiVersion: v1
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha2"
    kind: "DeschedulerPolicy"
    strategies:
      LowNodeUtilization:
        enabled: true                                                                                                                                                                     params:
          nodeResourceUtilizationThresholds:                                                                                                                                                  targetThresholds:
              cpu: 50                                                                                                                                                                           memory: 50                                                                                                                                                                        pods: 50
            thresholds:                                                                                                                                                                         cpu: 20                                                                                                                                                                           memory: 20
              pods: 20                                                                                                                                                                  RemoveDuplicates:                                                                                                                                                                   enabled: true                                                                                                                                                                   RemovePodsHavingTooManyRestarts:                                                                                                                                                    enabled: true                                                                                                                                                                     params:                                                                                                                                                                             podsHavingTooManyRestarts:
            includingInitContainers: true                                                                                                                                                     podRestartThreshold: 100                                                                                                                                                    RemovePodsViolatingInterPodAntiAffinity:                                                                                                                                            enabled: true                                                                                                                                                                   RemovePodsViolatingNodeAffinity:                                                                                                                                                    enabled: true                                                                                                                                                                     params:                                                                                                                                                                             nodeAffinityType:                                                                                                                                                                 - requiredDuringSchedulingIgnoredDuringExecution                                                                                                                              RemovePodsViolatingNodeTaints:                                                                                                                                                      enabled: true                                                                                                                                                                   RemovePodsViolatingTopologySpreadConstraint:
        enabled: true                             
        params:
          includeSoftConstraints: true
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: descheduler
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2024-05-16T20:10:08Z"
  labels:
    app.kubernetes.io/instance: descheduler
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: descheduler
    app.kubernetes.io/version: 0.29.0
    helm.sh/chart: descheduler-0.29.0
  name: descheduler
  namespace: kube-system
  resourceVersion: "488582"
  uid: 30e7d9ad-8c4a-4db4-b72e-afc754749d0a

What k8s version are you using (kubectl version)?

v1.28.9+rke2r1

kubectl version Output
$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.13", GitCommit:"96b450c75ae3c48037f651b4777646dcca855ed0", GitTreeState:"clean", BuildDate:"2024-04-16T15:03:38Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.9+rke2r1", GitCommit:"587f5fe8a69b0d15b578eaf478f009247d1c5d47", GitTreeState:"clean", BuildDate:"2024-04-16T22:21:01Z", GoVersion:"go1.21.9 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

values.yaml

# cspell:ignore deschedule descheduling otel
# https://github.com/kubernetes-sigs/descheduler/tree/master/charts/descheduler

# CronJob or Deployment
kind: Deployment

image:
  repository: k8s-registry-io.artifactory.metro.ad.selinc.com/descheduler/descheduler
  pullPolicy: Always

# Required when running as a Deployment
deschedulingInterval: 2m

# Specifies the replica count for Deployment
# Set leaderElection if you want to use more than 1 replica
# Set affinity.podAntiAffinity rule if you want to schedule onto a node
# only if that node is in the same zone as at least one already-running descheduler
replicas: ${replicas}

resources:
  requests:
    cpu: 25m
    memory: 32Mi
  limits:
    cpu: 250m
    memory: 128Mi
# Specifies whether Leader Election resources should be created
# Required when running as a Deployment
# NOTE: Leader election can't be activated if DryRun enabled
leaderElection:
  enabled: true

# Recommended to use the latest Policy API version supported by the Descheduler app version
deschedulerPolicyAPIVersion: "descheduler/v1alpha2"

deschedulerPolicy:
  # nodeSelector: "key1=value1,key2=value2"
  # maxNoOfPodsToEvictPerNode: 10
  # maxNoOfPodsToEvictPerNamespace: 10
  # ignorePvcPods: true
  # evictLocalStoragePods: true
  # tracing:
  #   collectorEndpoint: otel-collector.observability.svc.cluster.local:4317
  #   transportCert: ""
  #   serviceName: ""
  #   serviceNamespace: ""
  #   sampleRate: 1.0
  #   fallbackToNoOpProviderOnError: true
  strategies:
    RemoveDuplicates:
      enabled: true
    RemovePodsHavingTooManyRestarts:
      enabled: true
      params:
        podsHavingTooManyRestarts:
          podRestartThreshold: 100
          includingInitContainers: true
    RemovePodsViolatingNodeTaints:
      enabled: true
    RemovePodsViolatingNodeAffinity:
      enabled: true
      params:
        nodeAffinityType:
          - requiredDuringSchedulingIgnoredDuringExecution
    RemovePodsViolatingInterPodAntiAffinity:
      enabled: true
    RemovePodsViolatingTopologySpreadConstraint:
      enabled: true
      params:
        includeSoftConstraints: true
    LowNodeUtilization:
      enabled: true
      params:
        nodeResourceUtilizationThresholds:
          thresholds:
            cpu: 20
            memory: 20
            pods: 20
          targetThresholds:
            cpu: 50
            memory: 50
            pods: 50

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app.kubernetes.io/name: descheduler

serviceMonitor:
  enabled: false
  # The namespace where Prometheus expects to find service monitors.
  # namespace: ""
  # Add custom labels to the ServiceMonitor resource
  additionalLabels:
    {}
    # prometheus: kube-prometheus-stack
  interval: ""
  # honorLabels: true
  insecureSkipVerify: true
  serverName: null
  metricRelabelings:
    []
    # - action: keep
    #   regex: 'descheduler_(build_info|pods_evicted)'
    #   sourceLabels: [__name__]
  relabelings:
    []
    # - sourceLabels: [__meta_kubernetes_pod_node_name]
    #   separator: ;
    #   regex: ^(.*)$
    #   targetLabel: nodename
    #   replacement: $1
    #   action: replace

What did you expect to see?

It should be evicting the pods.

What did you see instead?

It did not evict the pods:

descheduler-7866496868-f7zgr I0516 20:23:23.978408       1 descheduler.go:155] Building a pod evictor 
descheduler-7866496868-f7zgr I0516 20:23:23.978472       1 descheduler.go:169] "Number of evicted pods" totalEvicted=0
fbuchmeier-abi commented 1 month ago

Hi @sarg3nt ,

could you check if the descheduler works for you when you use the new configuration format v1alpha2? I also had problems getting it to work and it turns out the v0.29.0 already ignores any v1alpha1 config like the one you have in your helm values. See the readme for references: https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#policy-default-evictor-and-strategy-plugins

An example that is running fine for me looks like this:

# Recommended to use the latest Policy API version supported by the Descheduler app version
deschedulerPolicyAPIVersion: "descheduler/v1alpha2"

deschedulerPolicy:
  profiles:
    - name: default
      pluginConfig:
        - name: DefaultEvictor
          args:
            ignorePvcPods: true
            evictLocalStoragePods: true
        - name: RemoveDuplicates
        - name: RemovePodsHavingTooManyRestarts
          args:
            podRestartThreshold: 5
            includingInitContainers: true
        - name: LowNodeUtilization
          args:
            thresholds:
              cpu: 30
              memory: 30
              pods: 30
            targetThresholds:
              cpu: 70
              memory: 70
              pods: 70
      plugins:
        balance:
          enabled:
            - RemoveDuplicates
            - LowNodeUtilization
        deschedule:
          enabled:
            - RemovePodsHavingTooManyRestarts

You can also increase the verbosity of the logging output like this (see e.g. https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#pod-evictions):

cmdOptions:
  v: 5

However, in case no balancer or evcition is configured this will not change much.

Once your config is correct it should output something along these lines:

descheduler-55b8cb6f58-kpfn4 descheduler I0521 15:33:32.238954       1 defaultevictor.go:202] "Pod fails the following checks" pod="kube-monitoring/prometheus-operator-kube-monitoring-prometheus-node-exportslk6q" checks="pod is a DaemonSet pod"
descheduler-55b8cb6f58-kpfn4 descheduler I0521 15:33:32.239001       1 defaultevictor.go:202] "Pod fails the following checks" pod="kube-system/descheduler-55b8cb6f58-kpfn4" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
descheduler-55b8cb6f58-kpfn4 descheduler I0521 15:33:32.239138       1 profile.go:356] "Total number of pods evicted" extension point="Balance" evictedPods=0

Best regards, Florian.

sarg3nt commented 1 month ago

Hi @fbuchmeier-abi thanks for pointing that out. I think the "bug" then is that the default values in the 0.29.0 chart are wrong, as that was what I was using. I saw that 0.30.0 was out so I upgraded to that and used your values and it worked. I then took a look at the values.yaml for 0.30.0 which is release-1.30 (why???) and tried those and got more errors. Eventually I figured out that two values in the default deschedulerPolicy are not supported.

The commented out lines below are the culprits. nodeAffinityType and includeSoftConstraints

deschedulerPolicy:
  profiles:
    - name: default
      pluginConfig:
        - name: DefaultEvictor
          args:
            ignorePvcPods: true
            evictLocalStoragePods: true
        - name: RemoveDuplicates
        - name: RemovePodsHavingTooManyRestarts
          args:
            podRestartThreshold: 100
            includingInitContainers: true
        - name: RemovePodsViolatingNodeTaints
          # args:
          #   nodeAffinityType:
          #     - requiredDuringSchedulingIgnoredDuringExecution
        - name: RemovePodsViolatingInterPodAntiAffinity
        - name: RemovePodsViolatingTopologySpreadConstraint
          #args:
          #  includeSoftConstraints: false
        - name: LowNodeUtilization
          args:
            thresholds:
              cpu: 20
              memory: 20
              pods: 20
            targetThresholds:
              cpu: 50
              memory: 50
              pods: 50
      plugins:
        balance:
          enabled:
            - RemoveDuplicates
            - RemovePodsViolatingNodeAffinity
            - RemovePodsViolatingTopologySpreadConstraint
            - LowNodeUtilization
        deschedule:
          enabled:
            - RemovePodsHavingTooManyRestarts
            - RemovePodsViolatingNodeTaints
            - RemovePodsViolatingNodeAffinity
            - RemovePodsViolatingInterPodAntiAffinity

So maybe someone should update the vaules.yaml for 1.30 Thank you again. All is working now and I'll close with this comment.