Endless descheduling of pods with node affinity preferredDuringSchedulingIgnoredDuringExecution and enough resources available on not tainted node but not on a tainted node #1410
=== RUN TestRespectPodsViolatingNodeAffinity/Pod_is_scheduled_on_node_without_matching_labels,_and_schedulable_node_where_pod_could_fit_is_available_but_no_having_enough_resources,_should_not_evict_[preferred_affinity]
I0521 14:51:01.007609 31700 node.go:157] "Pod does not fit on any other node" pod:="default/podWithNodeAffinity" node:="nodeWithLabels" error:="[insufficient cpu, insufficient memory]"
I0521 14:51:01.009266 31700 node.go:154] "Pod fits on node" pod="default/podWithNodeAffinity" node="nodeWithoutLabels3"
I0521 14:51:01.009266 31700 defaultevictor.go:207] "pod does fit on other node" pod="default/podWithNodeAffinity"
I0521 14:51:01.009266 31700 node.go:186] "Pod fits on node" pod="default/podWithNodeAffinity" node="nodeWithoutLabels2"
I0521 14:51:01.009266 31700 node.go:186] "Pod fits on node" pod="default/podWithNodeAffinity" node="nodeWithoutLabels3"
I0521 14:51:01.009266 31700 node.go:189] "Pod does not fit on any node" pod:="default/podWithNodeAffinity" node:="nodeWithLabels" error:="[insufficient memory, insufficient cpu]"
I0521 14:51:01.009266 31700 node.go:170] "Pod fits on node" pod="default/podWithNodeAffinity" node="nodeWithoutLabels2"
I0521 14:51:01.009266 31700 predicates.go:301] "node has weight for pod" pod="default/podWithNodeAffinity" node="nodeWithoutLabels2" sum weight=0
I0521 14:51:01.009266 31700 node.go:325] "Pod has total weight on node " pod="default/podWithNodeAffinity" node="nodeWithoutLabels2" best weight=0
I0521 14:51:01.009266 31700 predicates.go:301] "node has weight for pod" pod="default/podWithNodeAffinity" node="nodeWithoutLabels3" sum weight=0
I0521 14:51:01.009266 31700 node.go:325] "Pod has total weight on node " pod="default/podWithNodeAffinity" node="nodeWithoutLabels3" best weight=0
I0521 14:51:01.009266 31700 node.go:340] "Pod has weight on node " node="default/podWithNodeAffinity" best weight=0
I0521 14:51:01.009266 31700 predicates.go:301] "node has weight for pod" pod="default/podWithNodeAffinity" node="nodeWithoutLabels2" sum weight=0
I0521 14:51:01.009266 31700 node.go:325] "Pod has total weight on node " pod="default/podWithNodeAffinity" node="nodeWithoutLabels2" best weight=0
I0521 14:51:01.009266 31700 node_affinity.go:108] "filtering on preferredDuringSchedulingIgnoredDuringExecution " node affinity=true evict filter =true fits other node=true best node weight=0 currentNodeWeight=0
--- PASS: TestRespectPodsViolatingNodeAffinity/Pod_is_scheduled_on_node_without_matching_labels,_and_schedulable_node_where_pod_could_fit_is_available_but_no_having_enough_resources,_should_not_evict_[preferred_affinity] (0.11s)
PASS
What did you see instead?
The pod got endlessly descheduled.
=== RUN TestRemovePodsViolatingNodeAffinity/Pod_is_scheduled_on_node_without_matching_labels,_and_schedulable_node_where_pod_could_fit_is_available,_should_not_evict_[preferred_affinity]
I0521 13:35:32.301233 26416 node.go:157] "Pod does not fit on any other node" pod:="default/podWithNodeAffinity" node:="nodeWithLabels" error:="[insufficient cpu, insufficient memory]"
I0521 13:35:32.302299 26416 node.go:154] "Pod fits on node" pod="default/podWithNodeAffinity" node="nodeWithoutLabels3"
I0521 13:35:32.302299 26416 defaultevictor.go:207] "pod does fit on other node" pod="default/podWithNodeAffinity"
I0521 13:35:32.302299 26416 node.go:170] "Pod fits on node" pod="default/podWithNodeAffinity" node="nodeWithoutLabels2"
I0521 13:35:32.302299 26416 predicates.go:301] "node has weight for pod" pod="default/podWithNodeAffinity" node="nodeWithoutLabels2" sum weight=0
I0521 13:35:32.302299 26416 node.go:308] "Pod has total weight on node " pod="default/podWithNodeAffinity" node="nodeWithoutLabels2" best weight=0
I0521 13:35:32.302299 26416 predicates.go:301] "node has weight for pod" pod="default/podWithNodeAffinity" node="nodeWithoutLabels3" sum weight=0
I0521 13:35:32.302299 26416 node.go:308] "Pod has total weight on node " pod="default/podWithNodeAffinity" node="nodeWithoutLabels3" best weight=0
I0521 13:35:32.302299 26416 predicates.go:301] "node has weight for pod" pod="default/podWithNodeAffinity" node="nodeWithLabels" sum weight=10
I0521 13:35:32.302299 26416 node.go:308] "Pod has total weight on node " pod="default/podWithNodeAffinity" node="nodeWithLabels" best weight=10
I0521 13:35:32.302299 26416 node.go:323] "Pod has weight on node " node="default/podWithNodeAffinity" best weight=10
I0521 13:35:32.302299 26416 predicates.go:301] "node has weight for pod" pod="default/podWithNodeAffinity" node="nodeWithoutLabels2" sum weight=0
I0521 13:35:32.302299 26416 node.go:308] "Pod has total weight on node " pod="default/podWithNodeAffinity" node="nodeWithoutLabels2" best weight=0
I0521 13:35:32.302299 26416 node_affinity.go:107] "filtering on preferredDuringSchedulingIgnoredDuringExecution " node affinity=true evict filter =true fits other node=true best node weight=10 currentNodeWeight=0
node_affinity_test.go:244: Test "Pod is scheduled on node without matching labels, and schedulable node where pod could fit is available, should not evict [preferred affinity]" failed, expected 0 pod evictions, but got 1 pod evictions
--- FAIL: TestRemovePodsViolatingNodeAffinity/Pod_is_scheduled_on_node_without_matching_labels,_and_schedulable_node_where_pod_could_fit_is_available,_should_not_evict_[preferred_affinity] (0.11s)
FAIL
Analysis
I analyzed the line node_affinity#105 to cause this behaviour:
filterFunc := func(pod *v1.Pod, node *v1.Node, nodes []*v1.Node) bool {
return utils.PodHasNodeAffinity(pod, utils.PreferredDuringSchedulingIgnoredDuringExecution) &&
d.handle.Evictor().Filter(pod) &&
nodeutil.PodFitsAnyNode(d.handle.GetPodsAssignedToNodeFunc(), pod, nodes) &&
// Here all nodes are taken into account -> also where not enough resources are available
// if there is a tainted node with not enough resources it will deschedule the pod
(nodeutil.GetBestNodeWeightGivenPodPreferredAffinity(pod, nodes) > nodeutil.GetNodeWeightGivenPodPreferredAffinity(pod, node))
}
As a working example for debugging purposes i tested following code (without great knowledge of how to solve this best)
filterFunc := func(pod *v1.Pod, node *v1.Node, nodes []*v1.Node) bool {
fittingNodes := nodeutil.PodFittingNodes(d.handle.GetPodsAssignedToNodeFunc(), pod, nodes)
return utils.PodHasNodeAffinity(pod, utils.PreferredDuringSchedulingIgnoredDuringExecution) &&
d.handle.Evictor().Filter(pod) &&
nodeutil.PodFitsAnyNode(d.handle.GetPodsAssignedToNodeFunc(), pod, nodes) &&
(nodeutil.GetBestNodeWeightGivenPodPreferredAffinity(pod, fittingNodes) > nodeutil.GetNodeWeightGivenPodPreferredAffinity(pod, node))
}
func PodFittingNodes(nodeIndexer podutil.GetPodsAssignedToNodeFunc, pod *v1.Pod, nodes []*v1.Node) []*v1.Node {
var fittingNodes []*v1.Node
for _, node := range nodes {
errors := NodeFit(nodeIndexer, pod, node)
if len(errors) == 0 {
klog.InfoS("Pod fits on node", "pod", klog.KObj(pod), "node", klog.KObj(node))
fittingNodes = append(fittingNodes, node)
} else {
klog.InfoS("Pod does not fit on any node",
"pod:", klog.KObj(pod), "node:", klog.KObj(node), "error:", utilerrors.NewAggregate(errors).Error())
}
}
return fittingNodes
}
What version of descheduler are you using?
descheduler version: 0.29.0/0.30.0
Does this issue reproduce with the latest release? yes
Which descheduler CLI options are you using?
Please provide a copy of your descheduler policy config file
What k8s version are you using (
kubectl version
)? v.1.28.3kubectl version
OutputWhat did you do? Given a deployment with nodeAffinity
and not having enough resources on the tainted node pool but on an untainted node pool leads to following behaviour:
What did you expect to see? Following test has been created:
The pod should not be descheduled - see
What did you see instead? The pod got endlessly descheduled.
Analysis I analyzed the line node_affinity#105 to cause this behaviour:
As a working example for debugging purposes i tested following code (without great knowledge of how to solve this best)