kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.41k stars 662 forks source link

NodeFit Specification #1149

Open a7i opened 1 year ago

a7i commented 1 year ago

Is your feature request related to a problem? Please describe.

Currently nodeFit is a boolean flag and the user has no control over which nodeFit checks should run or not.

Describe the solution you'd like

Full control over what nodeFit check is enabled or not

nodeFit:
  requests: false
  tolerateTaints: true
  affinity: true
  nodeSelector: true
  schedulable: true

Describe alternatives you've considered

Overwriting the Default Evictor

What version of descheduler are you using?

descheduler version: v0.27.0x

Additional context

damemi commented 1 year ago

Originally nodefit was planned to just eventually merge into the default state, we just made it an option because it was a change in how descheduler currently worked.

But seeing the options enumerated like this is really interesting. What if we just merged nodeFit into the default evictor or PreEvictionFilter? I think it has grown beyond what we originally envisioned and honestly I think it makes sense to expose each of these toggles

a7i commented 1 year ago

Adding @ingvagabund's proposal in this Issue:

At the same time someone might ask for disabling only specific checks. So instead of defining a bool argument I would like to suggest an enum.

topologyBalanceNodeFit:
- MatchSelector
- TaintsAndTolerations
- InsufficientResources
- SchedulableNodes
type NodeFitConditions string

var (
    MatchSelectorNodeFitConditions NodeFitConditions = "MatchSelector"
    TaintsAndTolerationsNodeFitConditions NodeFitConditions = "TaintsAndTolerations"
    InsufficientResourcesNodeFitConditions NodeFitConditions = "InsufficientResources"
    SchedulableNodesNodeFitConditions NodeFitConditions = "SchedulableNodes"
)

type RemovePodsViolatingTopologySpreadConstraintArgs struct {
    ...
    TopologyBalanceNodeFit *[]NodeFitConditions `json:"topologyBalanceNodeFit,omitempty"`
}

Also on DefaultEvictor

ingvagabund commented 1 year ago

If there's a way I would like to avoid using pointers in the API definitions. Atm, TopologyBalanceNodeFit == nil means to use any reasonable defaults (that can change over time) vs. *TopologyBalanceNodeFit == []NodeFitConditions means do not use any node fit conditions, i.e. nodeFit feature disabled.

a7i commented 1 year ago

Is there a way to make this "future proof" in case we decide to pull in Scheduler's Filter extension point?

ingvagabund commented 1 year ago

The goal is to definitely make it extendable. Unfortunately, I am out of time for the moment to help extend the design.

ingvagabund commented 1 year ago

To avoid using a pointer to a type:

type NodeFitConditions string

var (
    NoCondition NodeFitConditions = ""
    MatchSelectorNodeFitConditions NodeFitConditions = "MatchSelector"
    TaintsAndTolerationsNodeFitConditions NodeFitConditions = "TaintsAndTolerations"
    InsufficientResourcesNodeFitConditions NodeFitConditions = "InsufficientResources"
    SchedulableNodesNodeFitConditions NodeFitConditions = "SchedulableNodes"
)

type RemovePodsViolatingTopologySpreadConstraintArgs struct {
    ...
    TopologyBalanceNodeFit []NodeFitConditions `json:"topologyBalanceNodeFit,omitempty"`
}
RemovePodsViolatingTopologySpreadConstraintArgs:
- TopologyBalanceNodeFit: [] // empty field -> fall back to defaults which is a subject to change
- TopologyBalanceNodeFit: [""] // empty string -> disable the node fit checks
- TopologyBalanceNodeFit: ["Cond1", ..., "CondN"] // arbitrary list of enabled conditions

Alternatively, "" can be replaced with "NoCheck" or similar string which if specified will be the only valid item in the list.

logyball commented 11 months ago

Hello, we have been running into a situation which was somewhat surprising until I dug into the code and learned that NodeFit does not take into account all scheduler predicates, notably pod (anti)Affinity. At our organization, we've had multiple occasions where using descheduler HighNodeUtilization profile + GKE's optimize-utilization autoscaling profile has bit us because we assumed that NodeFit and the kube-scheduler would agree.

The basic situation is:

I suggest that we look into implementing Pod (anti)affinity checks into the NodeFit calculation, and would be happy to look into contributing it. It probably adds a decent amount of complexity, but would likely make the descheduler's decisions more closely resemble that of the kube-scheduler, what do you think?

Note: I wrote this as a comment rather than opening a new issue, but if you feel like this would be more appropriate as a new issue, I can happily convert it.

a7i commented 11 months ago

@logyball would you please create a new Issue? We can implement this as a patch without the need for re-design of NodeFit

logyball commented 11 months ago

Ok, converted to issue here, thanks!

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

nikimanoledaki commented 7 months ago

/remove-lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten