Open Vacant2333 opened 5 months ago
a little confused, I understand this is similar to the pod being scheduling gated. but different from pods and jobs, if Karmada scheduling is not required, then you only need to not create PropagationPolicy
. why do you need to introduce this feature?
a little confused, I understand this is similar to the pod being scheduling gated. but different from pods and jobs, if Karmada scheduling is not required, then you only need to not create
PropagationPolicy
. why do you need to introduce this feature?
It's a different case with PropagationPolicy
, PropagationPolicy
is associated with scheduling info like affinity, or say that workload and PropagationPolicy
are bound together, only after the PropagationPolicy
created can karmada konw how to schedule them, so what the issue cares is the moment the scheduling policy has been made, but should be admitted before karmada really schedule them.
@Monokaix thanks, but that's not what I want.
In k8s, Pod Scheduling Readiness is introduced mainly because some key information must be obtained after the pod is created.
Users want to select the corresponding nodes based on the image architecture (arm, x86, etc.) used by the pods. so it can be achieved by following the steps below.
Now back to this use case, is there a similar situation here?
@Monokaix thanks, but that's not what I want.
In k8s, Pod Scheduling Readiness is introduced mainly because some key information must be obtained after the pod is created.
For example
Users want to select the corresponding nodes based on the image architecture (arm, x86, etc.) used by the pods. so it can be achieved by following the steps below.
- add a scheduling gate
- parse the architecture supported by the image
- add it to the nodeAffinity
- remove the scheduling gate.
Now back to this use case, is there a similar situation here?
With this feature, we can do more things to suspend rb scheduling, like queue capacity management, and your use case can also be resolved:)
A: The focus of this issue is not to react to new changes. My focus is on pausing the distribution of the entire PropagationPolicy resource. Another point is that the Work resource is the result of Karmada scheduling. We need to ensure that the desired queue capability guarantees that resources have not yet reached the scheduling stage (considering resource quota usage).
@Monokaix The implementation allows suspend on PropagationPolicy
and ResourceBinding
: https://github.com/karmada-io/karmada/pull/4838
A: The focus of this issue is not to react to new changes. My focus is on pausing the distribution of the entire PropagationPolicy resource. Another point is that the Work resource is the result of Karmada scheduling. We need to ensure that the desired queue capability guarantees that resources have not yet reached the scheduling stage (considering resource quota usage).
@Monokaix The implementation allows suspend on
PropagationPolicy
andResourceBinding
: #4838
Seems great! so what's the effect of suspending Propagation
or ResourceBinding
? We want to suspend in scheduling stage, is this conflict with your case?
@whitewindmills hi, looks like we can close this issue cause the proposal merged now!
Yeah, I think so. Please let us know if the feature #5118 does not fit your case. /close
@RainbowMango: Closing this issue.
/reopen The rb suspend to schedule is not done yet.
@Monokaix: You can't reopen an issue/PR unless you authored it or you are a collaborator.
The rb suspend to schedule is not done yet.
/reopen
@liangyuanpeng: Reopened this issue.
@RainbowMango hi, looks like the proposal cant suspend schedule now, we need suspend schedule ResourceBinding, it can suspend dispatch now.
Is the #5218 tracking this?
Is the #5218 tracking this?
no, its another one, should i create a new issue for tracking this?
What would you like to be added: I hope to add the ability to suspend
ResourceBinding
. If set toTrue
,karmada-scheduler
will ignore thisResourceBinding
untilsuspend=false
before starting to schedule it.Why is this needed:
Karmada
can have a queueing capability similar to the combination ofKueue
andkube-scheduler
. This would enable distributing workloads/jobs to Worker Clusters based on priority when there are a large number of loads/jobs.PropagationPolicy
instances that I deploy, until I setPropagationPolicy.spec.suspend
tofalse
(and propagate it toResourceBinding
) in specific circumstances.Question And Answers: Q: How does
[Kueue](https://kubernetes.io/blog/2022/10/04/introducing-kueue/)
provide queue capabilities tokube-scheduler
? A: TakingJob
as an example,Job
contains asuspend
field. If this field is set toTrue
,kube-scheduler
will ignore theJob
until it becomes False.Kueue
on the other hand, provides an additional controller. It includes queue capabilities along with sorting algorithms.Kueue
considers multiple conditions (such as task priority, resource availability) before updatingsuspend
to false and handing it over tokube-scheduler
for normal scheduling thereafter.Q: Does
PropagationPolicy
also need to have asuspend
field? A: Yes, as I understand it,PropagationPolicy
is user-facing resource. Users only need to setsuspend
here, and it will automatically propagate to associated resources (ResourceBinding
,Work
).Q: Whats the difference with issue
Ability to suspend work
#4688 ? A: The focus of this issue is not to react to new changes. My focus is on pausing the distribution of the entirePropagationPolicy
resource. Another point is that theWork
resource is the result ofKarmada
scheduling. We need to ensure that the desired queue capability guarantees that resources have not yet reached the scheduling stage (considering resource quota usage).Additional Considerations: We actually don't need to add the
suspend
field toPropagationPolicy
; we only need to add the suspend field toResourceBinding
. Since our requirement is to perform certain actions beforekarmada-scheduler
schedules, we just need to automatically setsuspend
totrue
whenResourceBinding
is created (for example, through a webhook). This way, users don't need to setsuspend
when creatingPropagationPolicy
, which also avoids the issue of users attempting to modifysuspend
field after scheduling. In fact, users don't need to be aware of thesuspend
field; it is only provided for internal use by controllers.Of course, this point is still open for discussion.
Additional: This feature allows for more flexible load distribution and the ability to have multiple queues. Users can have some custom controls outside of Karmada, such as capacity management for tenants (queues) and support for multi-tenancy (multiple queues).
For example, we have three queues,
Q1
,Q2
, andQ3
, with priorities of 100, 80, and 20 respectively. Under multiple queues, when several jobs are submitted, to ensure fairness among tenants, we will cyclically select the highest priority job from each queue in priority order. For instance, we first select the highest priority job fromQ1
, then fromQ2
andQ3
, and set the corresponding job's suspend flag to false, handing it over to the Karmada scheduler for scheduling.Optional Controller:
Karmada
can also provide an optional controller to implement relatively simple queue and capacity management functionality. Additionally, the sorting of queues can provide an extension point for custom plugins.This optional controller mainly possesses the following capabilities: maintaining the priority of queues and jobs, managing the capacity of queues, modifying suspend to false after jobs are schedulable, and having a webhook that defaults to setting suspend to true when
ResourceBinding
/PropagationPolicy
is created. In other words, once this controller is used, it automatically takes over and blocks all downstream operations.If
Karmada
is willing to support this feature, I can provide an additional controller toKarmada
in my free time.