Open a7i opened 1 year ago
Originally nodefit was planned to just eventually merge into the default state, we just made it an option because it was a change in how descheduler currently worked.
But seeing the options enumerated like this is really interesting. What if we just merged nodeFit into the default evictor or PreEvictionFilter? I think it has grown beyond what we originally envisioned and honestly I think it makes sense to expose each of these toggles
Adding @ingvagabund's proposal in this Issue:
At the same time someone might ask for disabling only specific checks. So instead of defining a bool argument I would like to suggest an enum.
topologyBalanceNodeFit:
- MatchSelector
- TaintsAndTolerations
- InsufficientResources
- SchedulableNodes
type NodeFitConditions string
var (
MatchSelectorNodeFitConditions NodeFitConditions = "MatchSelector"
TaintsAndTolerationsNodeFitConditions NodeFitConditions = "TaintsAndTolerations"
InsufficientResourcesNodeFitConditions NodeFitConditions = "InsufficientResources"
SchedulableNodesNodeFitConditions NodeFitConditions = "SchedulableNodes"
)
type RemovePodsViolatingTopologySpreadConstraintArgs struct {
...
TopologyBalanceNodeFit *[]NodeFitConditions `json:"topologyBalanceNodeFit,omitempty"`
}
Also on DefaultEvictor
If there's a way I would like to avoid using pointers in the API definitions. Atm, TopologyBalanceNodeFit == nil
means to use any reasonable defaults (that can change over time) vs. *TopologyBalanceNodeFit == []NodeFitConditions
means do not use any node fit conditions, i.e. nodeFit feature disabled.
Is there a way to make this "future proof" in case we decide to pull in Scheduler's Filter extension point?
The goal is to definitely make it extendable. Unfortunately, I am out of time for the moment to help extend the design.
To avoid using a pointer to a type:
type NodeFitConditions string
var (
NoCondition NodeFitConditions = ""
MatchSelectorNodeFitConditions NodeFitConditions = "MatchSelector"
TaintsAndTolerationsNodeFitConditions NodeFitConditions = "TaintsAndTolerations"
InsufficientResourcesNodeFitConditions NodeFitConditions = "InsufficientResources"
SchedulableNodesNodeFitConditions NodeFitConditions = "SchedulableNodes"
)
type RemovePodsViolatingTopologySpreadConstraintArgs struct {
...
TopologyBalanceNodeFit []NodeFitConditions `json:"topologyBalanceNodeFit,omitempty"`
}
RemovePodsViolatingTopologySpreadConstraintArgs:
- TopologyBalanceNodeFit: [] // empty field -> fall back to defaults which is a subject to change
- TopologyBalanceNodeFit: [""] // empty string -> disable the node fit checks
- TopologyBalanceNodeFit: ["Cond1", ..., "CondN"] // arbitrary list of enabled conditions
Alternatively, ""
can be replaced with "NoCheck"
or similar string which if specified will be the only valid item in the list.
Hello, we have been running into a situation which was somewhat surprising until I dug into the code and learned that NodeFit
does not take into account all scheduler predicates, notably pod (anti)Affinity. At our organization, we've had multiple occasions where using descheduler HighNodeUtilization
profile + GKE's optimize-utilization
autoscaling profile has bit us because we assumed that NodeFit and the kube-scheduler would agree.
The basic situation is:
I suggest that we look into implementing Pod (anti)affinity checks into the NodeFit calculation, and would be happy to look into contributing it. It probably adds a decent amount of complexity, but would likely make the descheduler's decisions more closely resemble that of the kube-scheduler, what do you think?
Note: I wrote this as a comment rather than opening a new issue, but if you feel like this would be more appropriate as a new issue, I can happily convert it.
@logyball would you please create a new Issue? We can implement this as a patch without the need for re-design of NodeFit
Ok, converted to issue here, thanks!
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Is your feature request related to a problem? Please describe.
Currently nodeFit is a boolean flag and the user has no control over which nodeFit checks should run or not.
Describe the solution you'd like
Full control over what nodeFit check is enabled or not
Describe alternatives you've considered
Overwriting the Default Evictor
What version of descheduler are you using?
descheduler version: v0.27.0x
Additional context