kubewharf / kubeadmiral

Multi-Cluster Kubernetes Orchestration
Apache License 2.0
658 stars 89 forks source link

add preference binpack replicas plugin #295

Closed mrlihanbo closed 7 months ago

codecov[bot] commented 7 months ago

Codecov Report

Attention: 126 lines in your changes are missing coverage. Please review.

Comparison is base (e57d64a) 30.42% compared to head (863b6ef) 31.15%.

Files Patch % Lines
...k/plugins/preferencebinpack/preference_bin_pack.go 0.00% 89 Missing :warning:
...ler/framework/plugins/preferencebinpack/planner.go 92.57% 9 Missing and 6 partials :warning:
pkg/controllers/automigration/controller.go 0.00% 13 Missing :warning:
pkg/controllers/scheduler/profile.go 25.00% 2 Missing and 1 partial :warning:
pkg/controllers/scheduler/scheduler.go 40.00% 2 Missing and 1 partial :warning:
pkg/controllers/scheduler/schedulingunit.go 66.66% 2 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #295 +/- ## ========================================== + Coverage 30.42% 31.15% +0.73% ========================================== Files 120 122 +2 Lines 13944 14266 +322 ========================================== + Hits 4242 4445 +203 - Misses 9296 9406 +110 - Partials 406 415 +9 ``` | [Flag](https://app.codecov.io/gh/kubewharf/kubeadmiral/pull/295/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=kubewharf) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/kubewharf/kubeadmiral/pull/295/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=kubewharf) | `31.15% <62.83%> (+0.73%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=kubewharf#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

gary-lgy commented 7 months ago

Related to https://github.com/kubewharf/kubeadmiral/pull/289

The new PropagationPolicy field ReplicasStrategy decides which replicas plugin to use - the traditional spread, or the new capacity-probing binpack.

Scheduling profile was designed to configure plugins based on the scheduling requirements. Adding ReplicasStrategy defeats the purpose of scheduling profiles, and opens us up to the slippery slope of blindly adding new fields for every single plugin combination.

This is bad design imo.

limhawjia commented 7 months ago

Adding on to @gary-lgy 's point, the scheduling profile used to dictate which fields in PropagationPolicy are respected and what semantics they had. In a sense, the scheduling profile governed PropagationPolicy. There used to be a clean one-way relation between profiles and policies.

This PR introduces the inverse of this, creating a two-way relation between profiles and policies where they can affect one another in elaborate and unexpected ways. This is bad design and a slippery slope in my opinion as well.

limhawjia commented 7 months ago

I propose an alternative solution. @mrlihanbo @gary-lgy

Instead of introducing a new ReplicasStrategy field that modifies the plugins used in the scheduling profile, how about we provide multiple built-in profiles? One of which is spread (the default), and one of which is binpack.

spread will consist of the current default list of plugins, while binpack will contain plugins configured to provide bin-packing capabilities. Doing so will retain ease-of-use for the user. They do not have to configure their own scheduling profile, and enabling bin-packing is still a matter of setting a single field (just schedulingProfile instead of replicasStrategy).

Poor12 commented 7 months ago

Agreed all. But there is one thing, schedulingProfile is indeed more complicated than adding replicaStrategy to policy. I think if preferenceBinpack is a basic feature, it would be better to add replicaStrategy. Otherwise, if it's a extended feature, it would be better to add it in schedulingProfile. During the former discussion, seems we hope it to be a basic feature?

gary-lgy commented 7 months ago

To those without context, there was an offline discussion among @manmanhappy @gary-lgy @mrlihanbo @JackZxj and @Poor12 on this topic. @gary-lgy proposed the 2 built-in profiles solution mentioned by @limhawjia above. However, @manmanhappy rejected it and proposed the current solution as implemented in #289 and this PR (and did not leave much room for further discussion).

Replying to @Poor12's comment:

Agreed all. But there is one thing, schedulingProfile is indeed more complicated than adding replicaStrategy to policy.

From whose perspective? The profiles are built-in (either in code or created by the platform). From the users' perspective, they need to configure a single field in the policy spec, be it pp.spec.schedulingProfile = binpack or pp.spec.replicasStrategy = binpack. I do not see how one is "more complicated" than the other. To the developers, profile is an existing mechanism and the current solution involves API changes, so the latter is more complicated. Most importantly, the current solution pollutes the scheduler architecture for reasons stated in previous comments.

I think if preferenceBinpack is a basic feature, it would be better to add replicaStrategy. Otherwise, if it's as extended feature, it would be better to add it in schedulingProfile.

Basic or extended is subjective, and I don't think it's the right yardstick to determine whether a configuration option should become a policy field. There are in fact many considerations for preferring the profile approach:

  1. We already have too many fields in the policy spec. We can't expect the users to keep all of them in their head. Adding more fields further worsens the cognitive load of using a policy. Certainly, we can use defaulting to reduce some of its complexity, but managing the fields for us is also a burden as the interactions between them become ever harder to reason with.
  2. This new replica scheduling behaviour only works well for serverless/on-demand Kubernetes clusters and will spell disaster if used on a normal cluster backed by reversed nodes. How many percent of our user space would need to use this niche feature? Does it justify adding a new policy field when we already have another solution (profile) which can satisfy the requirements?

To be absolutely clear, I maintain my position (which I made clear during the initial offline discussion) that we shouldn't go down this slippery slope of continuously adding new fields.