Closed sarg3nt closed 1 month ago
Hi @sarg3nt ,
could you check if the descheduler works for you when you use the new configuration format v1alpha2? I also had problems getting it to work and it turns out the v0.29.0 already ignores any v1alpha1 config like the one you have in your helm values. See the readme for references: https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#policy-default-evictor-and-strategy-plugins
An example that is running fine for me looks like this:
# Recommended to use the latest Policy API version supported by the Descheduler app version
deschedulerPolicyAPIVersion: "descheduler/v1alpha2"
deschedulerPolicy:
profiles:
- name: default
pluginConfig:
- name: DefaultEvictor
args:
ignorePvcPods: true
evictLocalStoragePods: true
- name: RemoveDuplicates
- name: RemovePodsHavingTooManyRestarts
args:
podRestartThreshold: 5
includingInitContainers: true
- name: LowNodeUtilization
args:
thresholds:
cpu: 30
memory: 30
pods: 30
targetThresholds:
cpu: 70
memory: 70
pods: 70
plugins:
balance:
enabled:
- RemoveDuplicates
- LowNodeUtilization
deschedule:
enabled:
- RemovePodsHavingTooManyRestarts
You can also increase the verbosity of the logging output like this (see e.g. https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#pod-evictions):
cmdOptions:
v: 5
However, in case no balancer or evcition is configured this will not change much.
Once your config is correct it should output something along these lines:
descheduler-55b8cb6f58-kpfn4 descheduler I0521 15:33:32.238954 1 defaultevictor.go:202] "Pod fails the following checks" pod="kube-monitoring/prometheus-operator-kube-monitoring-prometheus-node-exportslk6q" checks="pod is a DaemonSet pod"
descheduler-55b8cb6f58-kpfn4 descheduler I0521 15:33:32.239001 1 defaultevictor.go:202] "Pod fails the following checks" pod="kube-system/descheduler-55b8cb6f58-kpfn4" checks="[pod has system critical priority, pod has higher priority than specified priority class threshold]"
descheduler-55b8cb6f58-kpfn4 descheduler I0521 15:33:32.239138 1 profile.go:356] "Total number of pods evicted" extension point="Balance" evictedPods=0
Best regards, Florian.
Hi @fbuchmeier-abi thanks for pointing that out. I think the "bug" then is that the default values in the 0.29.0
chart are wrong, as that was what I was using.
I saw that 0.30.0
was out so I upgraded to that and used your values and it worked.
I then took a look at the values.yaml for 0.30.0
which is release-1.30
(why???) and tried those and got more errors. Eventually I figured out that two values in the default deschedulerPolicy
are not supported.
The commented out lines below are the culprits. nodeAffinityType
and includeSoftConstraints
deschedulerPolicy:
profiles:
- name: default
pluginConfig:
- name: DefaultEvictor
args:
ignorePvcPods: true
evictLocalStoragePods: true
- name: RemoveDuplicates
- name: RemovePodsHavingTooManyRestarts
args:
podRestartThreshold: 100
includingInitContainers: true
- name: RemovePodsViolatingNodeTaints
# args:
# nodeAffinityType:
# - requiredDuringSchedulingIgnoredDuringExecution
- name: RemovePodsViolatingInterPodAntiAffinity
- name: RemovePodsViolatingTopologySpreadConstraint
#args:
# includeSoftConstraints: false
- name: LowNodeUtilization
args:
thresholds:
cpu: 20
memory: 20
pods: 20
targetThresholds:
cpu: 50
memory: 50
pods: 50
plugins:
balance:
enabled:
- RemoveDuplicates
- RemovePodsViolatingNodeAffinity
- RemovePodsViolatingTopologySpreadConstraint
- LowNodeUtilization
deschedule:
enabled:
- RemovePodsHavingTooManyRestarts
- RemovePodsViolatingNodeTaints
- RemovePodsViolatingNodeAffinity
- RemovePodsViolatingInterPodAntiAffinity
So maybe someone should update the vaules.yaml
for 1.30
Thank you again. All is working now and I'll close with this comment.
What version of descheduler are you using?
descheduler version: 0.29.0
Does this issue reproduce with the latest release?
This is the latest release
Which descheduler CLI options are you using?
The ones provided by the Helm chart:
Please provide a copy of your descheduler policy config file
What k8s version are you using (
kubectl version
)?v1.28.9+rke2r1
kubectl version
OutputWhat did you do?
topologySpreadConstraints
andpodAntiAffinity
defined in various places.cronjob
values.yaml
What did you expect to see?
It should be evicting the pods.
What did you see instead?
It did not evict the pods: