kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.23k stars 645 forks source link

RemoveDuplicates and nodeAffinity #1384

Open Mattie112 opened 2 months ago

Mattie112 commented 2 months ago

What version of descheduler are you using?

descheduler version: 0.26.1

Does this issue reproduce with the latest release?

Not sure, not compatible with my k8s version

Which descheduler CLI options are you using? I am using the helm chart with no changes: https://artifacthub.io/packages/helm/descheduler/descheduler

Please provide a copy of your descheduler policy config file I am using the helm chart with no changes: https://artifacthub.io/packages/helm/descheduler/descheduler

What k8s version are you using (kubectl version)?

kubectl version Output
Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.14", GitCommit:"6db79806d788bfb9cfc996deb7e2e178402e8b50", GitTreeState:"clean", BuildDate:"2024-02-14T10:42:41Z", GoVersion:"go1.21.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.14-eks-b9c9ed7", GitCommit:"7c3f2be51edd9fa5727b6ecc2c3fc3c578aa02ca", GitTreeState:"clean", BuildDate:"2024-03-02T03:46:35Z", GoVersion:"go1.21.7", Compiler:"gc", Platform:"linux/amd64"}

What did you do?

I have a ReplicaSet that should create 2 replicas. On that ReplicaSet (and thus the pods) I have the following:

nodeAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 1
      preference:
        matchExpressions:
          - key: eks.amazonaws.com/capacityType
            operator: NotIn
            values:
              - SPOT

So: do not run on any AWS spot instances.

Currently I only have a SINGLE node that matches this selector. (And 3 that spot instances)

What did you expect to see?

I expect the descheduler to NOT evict any pods from this deployment on that single node. It should not show other pods as valid options.

What did you see instead?

The pods keep getting evicted.

I0501 11:42:12.872410       1 removeduplicates.go:103] "Processing node" node="ip-10-0-103-104.eu-west-1.compute.internal"
I0501 11:42:12.872649       1 removeduplicates.go:162] "Duplicate found" pod="some-namespace/some-service-69fbd8d5b6-8s8t6"
I0501 11:42:12.872692       1 removeduplicates.go:162] "Duplicate found" pod="other-namespace/other-service 7fb4d95db7-hqzt6"
I0501 11:42:12.872711       1 removeduplicates.go:103] "Processing node" node="ip-10-0-103-151.eu-west-1.compute.internal"
I0501 11:42:12.872854       1 removeduplicates.go:103] "Processing node" node="ip-10-0-101-90.eu-west-1.compute.internal"
I0501 11:42:12.873060       1 removeduplicates.go:103] "Processing node" node="ip-10-0-102-81.eu-west-1.compute.internal"
I0501 11:42:12.873244       1 removeduplicates.go:194] "Adjusting feasible nodes" owner={namespace:some-namespace kind:ReplicaSet name:some-service-69fbd8d5b6 imagesHash:xx.dkr.ecr.eu-west-1.amazonaws.com/xx/xx@sha256:xx#nginx:stable} from=4 to=4
I0501 11:42:12.873286       1 removeduplicates.go:203] "Average occurrence per node" node="ip-10-0-103-104.eu-west-1.compute.internal" ownerKey={namespace:some-namespace kind:ReplicaSet name:some-service-69fbd8d5b6 imagesHash:xx.dkr.ecr.eu-west-1.amazonaws.com/xx/xx@sha256:xx#nginx:stable} avg=1
I0501 11:42:12.891045       1 evictions.go:162] "Evicted pod" pod="some-namespace/some-service-69fbd8d5b6-8s8t6" reason="" strategy="RemoveDuplicates" node="ip-10-0-103-104.eu-west-1.compute.internal"
I0501 11:42:12.891195       1 removeduplicates.go:194] "Adjusting feasible nodes" owner={namespace:other-namespace kind:ReplicaSet name:other-service-7fb4d95db7 imagesHash:xx.dkr.ecr.eu-west-1.amazonaws.com/xx/xx@sha256:xx} from=4 to=4
I0501 11:42:12.891343       1 removeduplicates.go:203] "Average occurrence per node" node="ip-10-0-103-104.eu-west-1.compute.internal" ownerKey={namespace:other-namespace kind:ReplicaSet name:other-service-7fb4d95db7 imagesHash:xx.dkr.ecr.eu-west-1.amazonaws.com/xx/xx@sha256:xx} avg=1
I0501 11:42:12.943196       1 evictions.go:162] "Evicted pod" pod="other-namespace/other-esrvice-7fb4d95db7-hqzt6" reason="" strategy="RemoveDuplicates" node="ip-10-0-103-104.eu-west-1.compute.internal"
I0501 11:42:12.943274       1 descheduler.go:408] "Number of evicted pods" totalEvicted=2
I0501 11:42:12.943621       1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"

And again, and again and again.

Why does it show as if 4 nodes would be ok while only 1 matches to the nodeAffinity rules?