kubernetes-sigs / descheduler

Descheduler for Kubernetes
https://sigs.k8s.io/descheduler
Apache License 2.0
4.23k stars 645 forks source link

return node fit error in advance #1436

Closed fanhaouu closed 3 days ago

fanhaouu commented 3 weeks ago

There are many filter policies in the current node fit, similar to the filter plugin in the scheduler. The correct logic should be: if any plugin does not meet the conditions, it should be terminated in advance. The current check logic is incomprehensible, and it actually performs all filter check logic, which is not only time-consuming, but also very unnecessary.(https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/descheduler/node/node.go#L107)

Because the default loop time is 5min, this time can be adjusted. The larger the cluster, the longer the nodeFit time, so the full loop time can easily exceed 5min. What if the user sets the loop time to 1 minute? Comparing one by one I think is very, very unnecessary, too time-consuming, we should be the same as the filter policy in the scheduler.

When checking the pod at https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/defaultevictor/defaultevictor.go#L67, it also returns early

k8s-ci-robot commented 3 weeks ago

Hi @fanhaouu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
a7i commented 2 weeks ago

/ok-to-test

/assign @ingvagabund given the recent discussions on future of NodeFit

ingvagabund commented 2 weeks ago

Or, we could run the checks in parallel. The question is whether it's useful to notify about all individual checks errors. If the goal here is to reduce the computation time, then parallelization works better.

fanhaouu commented 2 weeks ago

Or, we could run the checks in parallel. The question is whether it's useful to notify about all individual checks errors. If the goal here is to reduce the computation time, then parallelization works better.

thank you, master, I just added the parallel logic. I forgot to submit the code for parallel processing earlier.

ingvagabund commented 1 week ago

I just added the parallel logic.

This is another parallelization. Originally, I meant to parallelize internals of NodeFit itself. E.g. the individual checks. If we parallelize on the level of PodFitsAnyNode/PodFitsAnyOtherNode we need a different discussion.

I need to include the parallelization in my proposal for https://github.com/kubernetes-sigs/descheduler/issues/1421. Let's pause it for now unless there are convincing numbers/measurements showing the delay wrt. number of nodes/checks.

ingvagabund commented 1 week ago

@fanhaouu in my proposal I'd like to allow users to extend available nodeFit checks through custom plugins. For which we need to make sure none of the individual checks modifies pod/node arguments. Though, I can make a note in the proposal to revisit the parallelization later so I don't block this PR.

fanhaouu commented 1 week ago

With the new parallelization would you please create new test cases for both PodFitsAnyOtherNode and PodFitsAnyNode to make sure each node is processed properly? E.g. run PodFitsAnyOtherNode with a list of e.g. 4 nodes:

  • 1st and 3rd nodes do not fit a pod
  • 2nd and 4th nodes do not fit a pod

To make sure the parallelization works as expected.

ok, master

fanhaouu commented 5 days ago

Given there's podFitsNodes function would you please create a new test specifically for testing the parallelization? Something like this:

nodesTraversed := make(map[string]struct{})
...
podFitsNodes(
  nodeIndexer,
  pod,
  nodes,
  func(pod *v1.Pod, node *v1.Node) bool {
      nodesTraversed[node.Name] = struct{}{}
      return false
  },
)
...
// checking all four nodes have been processed/read.
for _, nodeName := range nodeNames {
  if _, exists := nodesTraversed[nodeName]; !exists {
      t.Failf("Node %v was not proccesed", nodeName)
  }
}

To make sure every node is checked. With this the new tests are no longer needed.

ok, thank you, master

ingvagabund commented 3 days ago

/approve /lgtm

k8s-ci-robot commented 3 days ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ingvagabund

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/descheduler/blob/master/OWNERS)~~ [ingvagabund] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment