Strategy RemovePodsViolatingNodeAffinity does not remove pod when affinity disappears

bprieur commented 5 months ago

What version of descheduler are you using?

descheduler version: v0.29.0

Does this issue reproduce with the latest release?

Yes.

Which descheduler CLI options are you using?

Defaults from chart helm release.

Please provide a copy of your descheduler policy config file

From chart helm release, enable only RemovePodsViolatingNodeAffinity.

Values applies with helm installation.

```yaml deschedulerPolicy: strategies: RemoveDuplicates: enabled: false RemovePodsHavingTooManyRestarts: enabled: false RemovePodsViolatingNodeTaints: enabled: false RemovePodsViolatingNodeAffinity: enabled: true params: nodeAffinityType: - requiredDuringSchedulingIgnoredDuringExecution - preferredDuringSchedulingIgnoredDuringExecution RemovePodsViolatingInterPodAntiAffinity: enabled: false RemovePodsViolatingTopologySpreadConstraint: enabled: false LowNodeUtilization: enabled: false ```

What k8s version are you using (kubectl version)?

kubectl version Output

$ kubectl version
Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.5+k3s1

What did you do?

Create a Deployment.

```yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.14.2 ports: - containerPort: 80 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: disktype operator: In values: - ssd ```

Labeled one node in the cluster

`kubectl label nodes disktype=ssd`

Unlabeled the node

`kubectl label nodes disktype-`

What did you expect to see?

Node are in pending status because any node in the cluster fit the affinity.

What did you see instead?

Pods are in running status.

With verbosity 4, descheduler log is node.go:166] "Pod does not fit on node" pod:="default/nginx-deployment-56f548b646-jmcqv" node:="pi-101" error:="pod node selector does not match the node label" for each node.

The strategy RemovePodsHavingTooManyRestarts deletes pods when the parameter podRestartThreshold is reached.

Maybe this kind of scenario isn't covered by the "RemovePodsViolatingNodeAffinity" strategy, or maybe it's deliberate?

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 week ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/descheduler/issues/1345#issuecomment-2190223602): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / descheduler

Strategy RemovePodsViolatingNodeAffinity does not remove pod when affinity disappears #1345