kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.49k stars 669 forks source link

Descheduling framework: define framework types #837

Closed ingvagabund closed 1 year ago

ingvagabund commented 2 years ago

Based on comments from https://github.com/kubernetes-sigs/descheduler/issues/753:

Use cases to keep in mind:

In overall, the task in question here is to define the data model of the framework. The functionality itself is part of the next issue(s).

Pre-requisities:

Strategies for migration:

ingvagabund commented 2 years ago

/help

k8s-ci-robot commented 2 years ago

@ingvagabund: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/descheduler/issues/837): >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ingvagabund commented 2 years ago

Why are we solving this issue?

To have the descheduling framework implemented/designed through the community effort

To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?

A completely new code is to be implemented. Including the directory structure of the framework. To be discussed as part of the resolution.

Does this issue have zero to low barrier of entry?

Not really. The discussion is the primary source of the changes.

How can the assignee reach out to you for help?

CC'ing @ingvagabund

binacs commented 2 years ago

/cc

harshanarayana commented 2 years ago

/cc @ingvagabund Would love to help

eminaktas commented 2 years ago

/cc

JaneLiuL commented 2 years ago

could it possible to separte several tasks or steps then we can cooperate together ~

pravarag commented 2 years ago

@ingvagabund can we refer https://github.com/kubernetes-sigs/descheduler/pull/781 for approaching this? I'm guessing it must be having some similar works. /cc @ingvagabund

ingvagabund commented 2 years ago

@ingvagabund can we refer https://github.com/kubernetes-sigs/descheduler/pull/781 for approaching this? I'm guessing it must be having some similar works.

I am using #781 as a reference for suggesting what we need. Nevertheless, it is still a proof of concept. Please treat it as such.

damemi commented 2 years ago

I think the 2 big things are configuration and plugins. So, to start I would propose two main subtasks:

Both of these should be seamless (in terms of not breaking functionality mid-move or delaying our regular release cadence if we hit trouble), so we can migrate plugins one-by-one if we choose. We can use wrapper functions around the existing strategy functions to shim the migration until it's ready to shift over if necessary, which is how the scheduling framework was implemented.

The first PR here then would be v1alpha2 of the DeschedulerPolicy API. @ingvagabund would you like to open that up for discussion based on the outline in your design doc?

That PR can come with a matching migration for the first strategy we want to tackle. This can be any strategy, but preferably a separate PR.

At that point, we can assign out the work to transition the remaining strategies to plugins. As we make progress on that, we can follow back up with the remaining tasks here and wire everything together. Does that sound reasonable?

knelasevero commented 2 years ago

Hey, I'm new here (and to some aspects of k8s development). I might have a few stupid questions, I wanna get them out of the way here:

The descheduling framework configuration can be converted into an internal representation (e.g. v1alpha2 -> internal)

Here we are assuming changes from v1alpha1 to v1alpha2 that will keep the same behavior, right? (and it makes sense, since it is stable)

But the idea is also to make this project stop reading from files and be a controller reading CRDs from your cluster, am I correct? (with informers and all that gist)

With that in mind, shouldn't the conversion happen with a conversion webhook so any call [get,update,create,etc] to those resources also see the conversion? (not only the descheduler)

The descheduling framework needs to provide basic handle for plugins

Are these runtime plugins that users can point to with workload running (shared lib style)? Or plugins that users can register in code and build their own images? EDIT: talked with @ingvagabund, and also checked the wip PR, and these are plugins following similar approach for plugin registry as the ones from the scheduler framework (which I was not familiar with).


I see the proposed v1alpha2 of DeschedulerPolicy here:

 apiVersion: descheduler/v1alpha2
kind: DeschedulerPolicy
profiles:
- name: ProfileName
  pluginConfig:
  - name: DefaultEvictor
    args:
      evictSystemCriticalPods: true # default false
      evictFailedBarePods: true # default false
      evictLocalStoragePods: true # default false
      nodeFit: true # default false
  - name: PodsHavingTooManyRestarts
    args:
      podRestartThreshold: 100
      includingInitContainers: true
  - name: LowNodeUtilization
    args:
      lowThreshold:
        cpu : 20
        memory: 20
        pods: 20
      highThreshold:
        cpu : 50
        memory: 50
        pods: 50
  - name: TopologySpreadConstraint
    args:
      includeSoftConstraints: true
  plugins:
    filter:
      # default filters: DefaultEvictor
      disabled:
      enabled:
    sort:
      # default sorters: Priority
      disabled:
        Priority
      enabled:
        Utilization
    deschedule:
      # plugins descheduling pods without checking other pods
      enabled:
      - PodsHavingTooManyRestarts
      - InterPodAntiAffinity
    rebalance:
      # plugins descheduling pods while balancing distribution
      enabled:
      - Duplicates
      - LowNodeUtilization
      - TopologySpreadConstraint
    evict:
      # default evict plugins: DefaultEvictor
      disabled:
        DefaultEvictor
      enabled:
        CustomEvictor

It has group version kind, but no spec and status. I am now not sure if the idea is for descheduler to be a controller also watching DeschedulerPolicy resrouces. Is it still the idea for this to be a config file? (this also may render my 1st question invalid :sweat_smile: )

damemi commented 2 years ago

But the idea is also to make this project stop reading from files and be a controller reading CRDs from your cluster, am I correct? (with informers and all that gist)

There's no plan to make the config a CRD, you might have seen some mention of informers as a way to trigger descheduling based on cluster events (for example, a node being added). But right now there's no need for a CRD

knelasevero commented 2 years ago

@damemi Great, now it's clear, thanks!

ingvagabund commented 2 years ago

Here we are assuming changes from v1alpha1 to v1alpha2 that will keep the same behavior, right? (and it makes sense, since it is stable)

In v1alpha2 we will be introducing new functionalities in the strategies. E.g. sorting pods. The expressive power of the configuration will go up a bit. Though, the original decision logic of all the strategies will stay unchanged in the core. Although, we can discuss specific changes per PR bases as we migrate the strategies into plugins.

But the idea is also to make this project stop reading from files and be a controller reading CRDs from your cluster, am I correct? (with informers and all that gist)

In the past there was the idea of sharing configuration of all controllers (including kubelets) in the cluster as objects so the configuration can be updated/read on the fly by higher level tooling. Not sure what's the current state of this effort. Though, in case of the scheduler and the descheduler the configuration is stored in a cluster through a config map which then contains yaml representation of it as a well-defined type with kind and apiVersion. Yet both scheduler and descheduler still read the configuration as a file passed in as a file location through flags.

ingvagabund commented 2 years ago

Finalize plugin API Convert existing strategies to plugins

+1 for this as the starting point. I will go ahead and prepare a PR where we can take a closer look at both the API and one of the strategies migrated into a plugin as an example for the remaining strategies.

knelasevero commented 2 years ago

https://github.com/kubernetes-sigs/descheduler/issues/845

These proposals for changes in DeschedulerPolicy regarding fields that don't make sense in certain strategies will have to be considered when creating v1alpha2 I assume

ingvagabund commented 2 years ago

@eminaktas @harshanarayana @JaneLiuL @pravarag @damemi any review help in https://github.com/kubernetes-sigs/descheduler/pull/847 appreciated

a7i commented 2 years ago

At that point, we can assign out the work to transition the remaining strategies to plugins.

I'm late to the part but count me in! 🕺🏼

knelasevero commented 2 years ago

I'm in as well 🙌

knelasevero commented 2 years ago

I assume the v1alpha2 from the proposal is already good enough since we did not get any more comments on it?

damemi commented 2 years ago

I assume the v1alpha2 from the proposal is already good enough since we did not get any more comments on it?

Yup, we're going ahead with the proposed design as our outline for the work. If we need to change anything (like something we didn't think about comes up) we can be flexible with it along the way

pravarag commented 2 years ago

@damemi @ingvagabund if I want to work on any of the above mentioned strategies here: https://github.com/kubernetes-sigs/descheduler/issues/837#issue-1261828383, shall I just claim directly or these are to be approached one after the other only?

damemi commented 2 years ago

@pravarag feel free to claim some and start opening PRs to move them. I think @ingvagabund is on vacation right now, so we won't be able to update the list with who was which ones, but we can keep track.

The first few will probably take some review but once we have the pattern established the rest should go easy.

knelasevero commented 2 years ago

Hey, I'll be working on migrating the nodeAfinity strategy. Should have a PR soon 🙂

EDIT: PR here: https://github.com/kubernetes-sigs/descheduler/pull/860

a7i commented 2 years ago

I can take FailedPods and PodLifeTime. Should have PRs soon

RemoveFailedPods PR #861

pravarag commented 2 years ago

I'll be happy to take LowNodeUtilizations and HighNodeUtilizations if this is not already worked upon.

jklaw90 commented 2 years ago

I can take RemovePodsViolatingTopologySpreadConstraint. Should be able to get a pr open next week.

JaneLiuL commented 2 years ago

i can take RemoveDuplicates and RemovePodsHavingTooManyRestarts. will get pr soon

knelasevero commented 2 years ago

I am grabbing and working on RemovePodsViolatingInterPodAntiAffinity next

JaneLiuL commented 2 years ago

it seems still lack of PodLifeTime, please ping me if need my help on this

pravarag commented 2 years ago

it seems still lack of PodLifeTime, please ping me if need my help on this

I will work on PodLifeTime, as NodeUtilization strategies were already submitted as part of this PR: https://github.com/kubernetes-sigs/descheduler/pull/910

a7i commented 2 years ago

@pravarag and @JaneLiuL I already have this mostly implemented. Please hold as I will have a PR soon.

JaneLiuL commented 2 years ago

greate~~ thanks @a7i i will skip the PodLifeTime

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

binacs commented 1 year ago

/remove-lifecycle stale

jiangxiaobin96 commented 1 year ago

Have all tasks been done?

knelasevero commented 1 year ago

@damemi @ingvagabund can we close this one?

knelasevero commented 1 year ago

Easy to read examples on how to build a plugin: Provide few examples of plugin implementations

I think this will kinda come with https://github.com/kubernetes-sigs/descheduler/issues/1089 when we figure out how we want to have our out-of-tree plugins and example repos for that

knelasevero commented 1 year ago

Closing this one.

Feel free to re-open.

/close

k8s-ci-robot commented 1 year ago

@knelasevero: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/descheduler/issues/837#issuecomment-1549717856): >Closing this one. > >Feel free to re-open. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.