Closed dddddai closed 2 years ago
Thanks for open this we can track the progress and talk about the solution here.
Any idea?
My thought would be re-schedule
the bindings. Two scenarios should be considered:
unjoined
joined
@RainbowMango Thanks for your reply
- cluster should be cleaned up from RB after cluster be unjoined
- cluster should be added to RB after cluster be joined
Yes this makes sense, but there might be a new problem, let's consider such a case:
binding.spec.clusters
AvoidSchedule
, which doesn't make sense
https://github.com/karmada-io/karmada/blob/8e1e16e95081d7431852c6a46ea372ff79a50b5e/pkg/scheduler/scheduler.go#L834-L836I have another question: does the scheduler perform Failover
after unjoining a target cluster?
I don't know cuz I have not practiced this
I have another question: does the scheduler perform
Failover
after unjoining a target cluster? I don't know cuz I have not practiced this
No.
@XiShanYongYe-Chang Thanks for your reply
No.
Is it an expected behavior? Shouldn't Failover
care about cluster delete event?
When there have cluster delete event, a rescheduling should maybe need to triggere.
How do @RainbowMango think?
echo on https://github.com/karmada-io/karmada/issues/829#issuecomment-945744121.
Agree. But no idea how to solve that issue now.
/priority important-soon @dddddai I added this issue to v0.10 milestone, let's fix this in this release.
Agree. But no idea how to solve that issue now.
How about removing applied placement annotation of binding on cluster add/delete? It will make the scheduler reschedule the binding as ReconcileSchedule
https://github.com/karmada-io/karmada/blob/8e1e16e95081d7431852c6a46ea372ff79a50b5e/pkg/scheduler/scheduler.go#L824-L828
Too tricky I guess.
IMHO, the scheduler should reschedule the bindings on cluster change(eg. cluster joined, cluster unjoined, cluster label changed...)
I add a cluster queue
in scheduler for handling cluster events to fix this issue, and it works fine, please see https://github.com/dddddai/karmada/commit/568b8700e5d2c7de5904a421c2f1a7bcc0ff1648
IMHO, the scheduler should reschedule the bindings on cluster change(eg. cluster joined, cluster unjoined, cluster label changed...)
+1 but not sure for cluster label changed
, too sensitive
isn't a good thing.
I'll take a look and comment on your commit.
but not sure for cluster label changed, too sensitive isn't a good thing.
For example, there is a propagation policy whose cluster affinity label selector is foo: bar
Do you mean we should keep the binding scheduled to the cluster though the label foo: bar
is removed from that cluster?
I think the scenario might be handled by descheduler.
Well, I'm not familiar with descheduler, but in this picture it seems that descheduler does not watch cluster events, it just watches workload and might reschedule them to other clusters due to the original cluster resource insufficiency, and it focuses on ScaleSchedule
rather than Reschedule
,please correct me if I'm wrong
We might discuss the descheduler more at the community meeting. Hope you can come and meet you there.
OK, I'll be there :)
To be clear, I have 4 questions:
FailoverSchedule
work when unjoining a cluster?SpreadConstraint.MinGroups
mean? Does it mean we should not propagate the resource unless group count >= MinGroups
?
If yes, shall we delete all binding.spec.clusters
when unjoining a cluster which causes
group count < MinGroups
?The key is: shall we always keep the consistency between propagation policy and binding.spec.clusters
?
Looking forward to seeing the progress of this issue
@mrlihanbo The #967 is waiting for your review.
@mrlihanbo The #967 is waiting for your review.
I will review the pr now
Looking forward to seeing the progress of this issue
Hi @mrlihanbo, before implementing this we have to answer the 4 questions above
Hello @RainbowMango, any ideas about these questions?
Shall we reschedule bindings when cluster field/label changed? (because the updated cluster might (not) fit the propagation policy)
I think we should take this scenario very carefully, it might bring drastic changes. Take Kubernetes as an example, after a pod has been scheduled to a node it will not get re-scheduled even in case of the node label change.
Should FailoverSchedule work when unjoining a cluster?
I think we should re-schedule the workload after one of the bonded clusters is unjoined. It doesn't have to be FailoverSchedule
(not sure).
What does SpreadConstraint.MinGroups mean? Does it mean we should not propagate the resource unless group count >= MinGroups?
Yes, you are right. The SpreadConstraint.MinGroups
restrict minimum cluster groups, if the scheduler can not find enough cluster groups, the schedule should trigger failure.
If yes, shall we delete all binding.spec.clusters when unjoining a cluster which causes group count < MinGroups?
I don't think so, just like the answer above, it's too dangerous, at least for now.
I see, so I guess we should reschedule workloads only when cluster joined/unjoined, right?
Let's focus on the scenario of cluster-unjoin for now. The workload should be re-scheduled after one of the bound clusters is unjoined. If no more clusters fit the propagation policy, just remove the unjoined
cluster from the binding object.
ld be re-scheduled after one of the bound clusters is unjoined. If no more clusters fit the propagation policy, just remove the
unjoined
clu
There exist a scenario if workload should be re-scheduled after one of the bound clusters is unjoined:
we should make sure that the behavior is what we expected.
I see, so I guess we should reschedule workloads only when cluster joined/unjoined, right?
@dddddai I took a glance at the FailoverSchedule
func. Maybe we can do it in this func. Just a suggestion.
@dddddai I took a glance at the
FailoverSchedule
func. Maybe we can do it in this func. Just a suggestion.
Thanks for digging into it, that's exactly what I did in #1049, the behavior of Reschedule
is the same as FailoverSchedule
@dddddai I took a glance at the
FailoverSchedule
func. Maybe we can do it in this func. Just a suggestion.Thanks for digging into it, that's exactly what I did in #1049, the behavior of
Reschedule
is the same asFailoverSchedule
Good job, I will review the pr ASAP.
/assign @RainbowMango @huone1
@RainbowMango: GitHub didn't allow me to assign the following users: huone1.
Note that only karmada-io members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
let me work it with you @dddddai
@huone1 Thank you!
Just a the issue mentioned #1644 ,On the scenario of a new cluster join,should the RB be reschedule? @dddddai
I was thinking so. Would ask @RainbowMango for comments.
I think it's similar to kubernetes, just as deschedule(https://github.com/kubernetes-sigs/descheduler) said , the descheduler should responsible for handle rescheduling of cluster status changing,cluster join and cluster unjoin.
What happened: Unjoined clusters still remain in
binding.spec.clusters
What you expected to happen: Unjoined clusters should be deleted from
binding.spec.clusters
How to reproduce it (as minimally and precisely as possible): 1.Set up environment(script v0.8)
2.Unjoin member1
3.Check
binding.spec.clusters
Anything else we need to know?: Is it an expected behavior? If not, who is supposed to take the responsibility to delete unjoined clusters from binding? Scheduler or other controllers (like cluster controller)?
Environment: