Is your feature request related to a problem? Please describe.
The NodeFit predicate was introduced to allow the descheduler to make better decisions about evictions to avoid cases where there's no feasible node for re-scheduling after a pod gets evicted. To enable the predicate, DefaultEvictor plugin provides an optional nodeFit option that each plugin can utilize. The list of existing checks has been extended over time in good faith to improve the eviction decisions. The NodeFit predicate currently consists of the following checks:
a pod matches a node selector
a pod tolerates taints
a pod fits resource requests
a node is unschedulable
a pod matches inter pod anti-affinity
Some plugins adopted the NodeFit predicate natively through invocation of additional PodFitsAnyOtherNode, PodFitsAnyNode and PodFitsCurrentNode predicates built on top of NodeFit. Nevertheless, there are cases where it's more preferable to check only a subset of existing checks or disable the checks completely. Which is problematic for such plugins where it's impossible to fully disable the checks.
User stories
Plugins like RemovePodsViolatingNodeAffinity or RemovePodsViolatingNodeTaints
have subset of NodeFit checks enabled natively. These checks can not be disabled
without disabling the corresponding plugin. Instead, as an administrator
I'd like to disable specific checks like "a pod fits resource requests" to get
as close as possible to disabling all NodeFit checks. So I can evict and detect
pending pods and allow cluster autoscalers or other tools to reconcile the situation.
As an administrator I'd like to configure PodLifetime plugin to check
there are nodes with sufficient resources that can accept any evicted pod
even though pod node selector does not match any node. So when there are
too many pending pods due to node label mismatch my automation can label
existing nodes and allocate more resources or have the multi-cluster scheduler
reschedule my workload to a different cluster.
As an administrator I'd like to make sure RemovePodsViolatingInterPodAntiAffinity
plugin evicts pods even though there's currently no node with sufficient resources
while respecting node affinities and taints. So the cluster autoscaler
can scale up new nodes when too many Pending pods are observed.
As an administrator I'd like to run a different scheduler than the default one.
For that I might need to disable some of the existing NodeFit checks that
are no longer valid or might collide with how the non-default scheduler works.
As an AI/ML infrastructure administrator I'd like to extend available NodeFit
predicates with GPU oriented checks and enable them only for specific (custom)
plugins/workload.
As a plugin developer I'd like to specify a list of NodeFit checks that need
to be disabled. Checks that either produce suboptional evictions or
are re-implemented by a given plugin.
Describe the solution you'd like
Allow to enable/disable individual checks the NodeFit predicate consists of.
Describe alternatives you've considered
TBD through a proposal.
Is your feature request related to a problem? Please describe.
The
NodeFit
predicate was introduced to allow the descheduler to make better decisions about evictions to avoid cases where there's no feasible node for re-scheduling after a pod gets evicted. To enable the predicate,DefaultEvictor
plugin provides an optionalnodeFit
option that each plugin can utilize. The list of existing checks has been extended over time in good faith to improve the eviction decisions. TheNodeFit
predicate currently consists of the following checks:Some plugins adopted the
NodeFit
predicate natively through invocation of additionalPodFitsAnyOtherNode
,PodFitsAnyNode
andPodFitsCurrentNode
predicates built on top ofNodeFit
. Nevertheless, there are cases where it's more preferable to check only a subset of existing checks or disable the checks completely. Which is problematic for such plugins where it's impossible to fully disable the checks.User stories
RemovePodsViolatingNodeAffinity
orRemovePodsViolatingNodeTaints
have subset ofNodeFit
checks enabled natively. These checks can not be disabled without disabling the corresponding plugin. Instead, as an administrator I'd like to disable specific checks like "a pod fits resource requests" to get as close as possible to disabling allNodeFit
checks. So I can evict and detect pending pods and allow cluster autoscalers or other tools to reconcile the situation.PodLifetime
plugin to check there are nodes with sufficient resources that can accept any evicted pod even though pod node selector does not match any node. So when there are too many pending pods due to node label mismatch my automation can label existing nodes and allocate more resources or have the multi-cluster scheduler reschedule my workload to a different cluster.RemovePodsViolatingInterPodAntiAffinity
plugin evicts pods even though there's currently no node with sufficient resources while respecting node affinities and taints. So the cluster autoscaler can scale up new nodes when too many Pending pods are observed.NodeFit
checks that are no longer valid or might collide with how the non-default scheduler works.NodeFit
predicates with GPU oriented checks and enable them only for specific (custom) plugins/workload.NodeFit
checks that need to be disabled. Checks that either produce suboptional evictions or are re-implemented by a given plugin.Describe the solution you'd like Allow to enable/disable individual checks the NodeFit predicate consists of.
Describe alternatives you've considered TBD through a proposal.
What version of descheduler are you using?
descheduler version: 0.30.z
Additional context