kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.44k stars 664 forks source link

priority threshold misconfigured, only one of priorityThreshold fields can be set #1217

Closed mstefany closed 6 months ago

mstefany commented 1 year ago

What version of descheduler are you using?

descheduler version: v0.27.1

Does this issue reproduce with the latest release?

Yes.

Which descheduler CLI options are you using?

Helm Chart defaults:

args
  - args:
    - --policy-config-file
    - /policy-dir/policy.yaml
    - --descheduling-interval
    - 10m
    - --v
    - "4"

Please provide a copy of your descheduler policy config file

policy ConfigMap
apiVersion: v1
data:
  policy.yaml: |
    apiVersion: "descheduler/v1alpha2"
    kind: "DeschedulerPolicy"
    profiles:
    - name: GenericProfile
      pluginConfig:
      - args:
          evictFailedBarePods: true
          evictLocalStoragePods: true
          nodeFit: true
          priorityThreshold:
            name: exclude-descheduler
        name: DefaultEvictor
      - args:
          thresholds:
            cpu: 20
            memory: 20
            pods: 20
        name: HighNodeUtilization
      - args:
          targetThresholds:
            cpu: 70
            memory: 70
            pods: 70
          thresholds:
            cpu: 20
            memory: 20
            pods: 20
        name: LowNodeUtilization
      - args:
          maxPodLifeTimeSeconds: 7200
          states:
          - ContainerCreating
          - Pending
          - PodInitializing
        name: PodLifeTime
      - args:
          excludeOwnerKinds:
          - ReplicaSet
        name: RemoveDuplicates
      - args:
          excludeOwnerKinds:
          - Job
          includingInitContainers: true
          minPodLifetimeSeconds: 3600
        name: RemoveFailedPods
      - args:
          includingInitContainers: true
          podRestartThreshold: 10
        name: RemovePodsHavingTooManyRestarts
      - name: RemovePodsViolatingInterPodAntiAffinity
      - args:
          nodeAffinityType:
          - requiredDuringSchedulingIgnoredDuringExecution
        name: RemovePodsViolatingNodeAffinity
      - name: RemovePodsViolatingNodeTaints
      - name: RemovePodsViolatingTopologySpreadConstraint
      plugins:
        balance:
          enabled:
          - HighNodeUtilization
          - LowNodeUtilization
          - RemoveDuplicates
          - RemovePodsViolatingTopologySpreadConstraint
        deschedule:
          enabled:
          - PodLifeTime
          - RemoveFailedPods
          - RemovePodsHavingTooManyRestarts
          - RemovePodsViolatingInterPodAntiAffinity
          - RemovePodsViolatingNodeAffinity
          - RemovePodsViolatingNodeTaints
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: descheduler
    meta.helm.sh/release-namespace: kube-system
    reloader.stakater.com/match: "true"
  creationTimestamp: "2023-08-16T09:22:39Z"
  labels:
    app.kubernetes.io/instance: descheduler
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: descheduler
    app.kubernetes.io/version: 0.27.1
    helm.sh/chart: descheduler-0.27.1
    helm.toolkit.fluxcd.io/name: descheduler
    helm.toolkit.fluxcd.io/namespace: kube-system
  name: descheduler
  namespace: kube-system
  resourceVersion: "467902404"
  uid: 600f131d-1515-4ae7-a9ef-1dc0963a247d

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version
Client Version: v1.25.7
Kustomize Version: v4.5.7
Server Version: v1.25.11-eks-a5565ad

What did you do?

Provided the ConfigMap with policy above, descheduler fails to start:

NAME                           READY   STATUS             RESTARTS      AGE
descheduler-644697d794-g5z47   0/1     CrashLoopBackOff   6 (63s ago)   7m14s

Logs show problem with priorityThreshold:

I0816 11:09:33.663355       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1692184173\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1692184171\" (2023-08-16 10:09:29 +0000 UTC to 2024-08-15 10:09:29 +0000 UTC (now=2023-08-16 11:09:33.663315485 +0000 UTC))"
I0816 11:09:33.663419       1 secure_serving.go:210] Serving securely on [::]:10258
I0816 11:09:33.663508       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
E0816 11:09:33.664577       1 server.go:99] "descheduler server" err="in profile GenericProfile: priority threshold misconfigured, only one of priorityThreshold fields can be set, got &TypeMeta{Kind:,APIVersion:,}"
I0816 11:09:33.664705       1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
I0816 11:09:33.664777       1 secure_serving.go:255] Stopped listening on [::]:10258

There is no priorityThreshold.value specified in the policy/ConfigMap, yet it argues that only one can be specified. If I remove the name key and replace it with value with some reasonable value, descheduler successfully starts:

I0816 09:33:54.749414       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1692178434\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1692178432\" (2023-08-16 08:33:50 +0000 UTC to 2024-08-15 08:33:50 +0000 UTC (now=2023-08-16 09:33:54.749372596 +0000 UTC))"
I0816 09:33:54.749494       1 secure_serving.go:210] Serving securely on [::]:10258
I0816 09:33:54.749578       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0816 09:33:54.760784       1 descheduler.go:123] Warning: Convert Kubernetes server minor version to float fail
W0816 09:33:54.760796       1 descheduler.go:127] Warning: Descheduler minor version 27 is not supported on your version of Kubernetes 1.25+. See compatibility docs for more info: https://github.com/kubernetes-sigs/descheduler#compatibility-matrix
I0816 09:33:54.768692       1 reflector.go:287] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.768708       1 reflector.go:323] Listing and watching *v1.Pod from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.768925       1 reflector.go:287] Starting reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.768941       1 reflector.go:323] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769088       1 reflector.go:287] Starting reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769102       1 reflector.go:323] Listing and watching *v1.Namespace from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769240       1 reflector.go:287] Starting reflector *v1.PriorityClass (0s) from k8s.io/client-go/informers/factory.go:150
I0816 09:33:54.769252       1 reflector.go:323] Listing and watching *v1.PriorityClass from k8s.io/client-go/informers/factory.go:150
I0816 09:33:55.347709       1 shared_informer.go:341] caches populated
I0816 09:33:55.347786       1 shared_informer.go:341] caches populated
I0816 09:33:55.347800       1 shared_informer.go:341] caches populated
I0816 09:33:56.748637       1 shared_informer.go:341] caches populated
I0816 09:33:56.751015       1 descheduler.go:292] Building a pod evictor
I0816 09:33:56.751071       1 defaultevictor.go:76] "Warning: EvictFailedBarePods is set to True. This could cause eviction of pods without ownerReferences."
I0816 09:33:56.751118       1 pod_lifetime.go:109] "Processing node" node="ip-10-254-48-230.ec2.internal"
[...]

What did you expect to see?

Running descheduler pod.

What did you see instead?

descheduler failing to start with error show above.

a7i commented 1 year ago

Hi @mstefany Thank you for all the details!

However, I am unable to reproduce this issue. Is it possible that you have multiple profiles defined in the policy?

gj409237405 commented 1 year ago

I also meet the same sitution as @mstefany

mstefany commented 1 year ago

Hi @mstefany Thank you for all the details!

However, I am unable to reproduce this issue. Is it possible that you have multiple profiles defined in the policy?

Nope, there shouldn't be anything additional except what I posted. No multiple profiles, etc. One thing however - I think I don't use the "default" profile name.

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 6 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/descheduler/issues/1217#issuecomment-2025768222): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.