karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.11k stars 805 forks source link

Reschedule bindings when unjoining a target cluster #1049

Closed dddddai closed 2 years ago

dddddai commented 2 years ago

What type of PR is this? /kind feature

What this PR does / why we need it: Please refer to https://github.com/karmada-io/karmada/issues/829#issuecomment-982235580

Which issue(s) this PR fixes: Part of #829

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

karmada-scheduler: reschedule bindings when unjoining a target cluster.
karmada-bot commented 2 years ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign rainbowmango after the PR has been reviewed. You can assign the PR to them by writing /assign @rainbowmango in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/karmada-io/karmada/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
mrlihanbo commented 2 years ago

we need to do code refactoring here which will failed for Duplicated policy:

    // TODO: should schedule as much as possible?
    deltaLen := len(spec.Clusters) - len(reservedClusters)
    if len(candidateClusters) < deltaLen {
        // for ReplicaSchedulingTypeDivided, we will try to migrate replicas to the other health clusters
        if placement.ReplicaScheduling == nil || placement.ReplicaScheduling.ReplicaSchedulingType == policyv1alpha1.ReplicaSchedulingTypeDuplicated {
            klog.Warningf("ignore reschedule binding as insufficient available cluster")
            return ScheduleResult{}, nil
        }
    }
dddddai commented 2 years ago

we need to do code refactoring here which will failed for Duplicated policy:

What do you mean by "failed for Duplicated policy"? If number of available clusters is less than deltaLen, it would NOT perform reschedule, and I don't know if it's expected

For example, there is a propagation policy that propagates a workload to all clusters, should it perform reschedule when unjoining a cluster? I guess the answer is YES, so is this the difference between Reschedule and FailoverSchedule?

mrlihanbo commented 2 years ago

we need to do code refactoring here which will failed for Duplicated policy:

What do you mean by "failed for Duplicated policy"? If number of available clusters is less than deltaLen, it would NOT perform reschedule, and I don't know if it's expected

For example, there is a propagation policy that propagates a workload to all clusters, should it perform reschedule when unjoining a cluster? I guess the answer is YES, so is this the difference between Reschedule and FailoverSchedule?

我直接使用中文吧。FailoverSchedule最早是设计成和spread constraint一起用的。所以它的逻辑是:

  1. 原调度结果中有几个集群故障了,从健康集群中选出相同数目的集群补充。对于divided的场景,其实还会触发一次scale schedule。而对于Duplicated场景,不配合spread constrain使用的话,会由于选不到新集群导致保留原有结果。
  2. 所以这里产生的一个问题是,我们需要把unjoin的集群从调度结果中移除,一但candidateClusters小于故障集群数,未移除。

原场景: propagation policy里指定[A, B, C]三个集群,spread constrain设为2,所以调度结果是[A, B]. 如果A故障了,会选出C补上,新结果为[B, C]

问题场景: propagation policy里指定[A, B, C]三个集群,不使用spread constrain,调度结果是[A, B, C]。 如果A故障了,选不出新集群,保留[A, B, C]的调度结果。而我们期望的是把unjoin的集群从调度结果里移除。

dddddai commented 2 years ago

如果A故障了,选不出新集群,保留[A, B, C]的调度结果。而我们期望的是把unjoin的集群从调度结果里移除。

明白 所以对于Reschedule,我们不考虑“从健康集群中选出相同数目的集群补充”,对吗?这也是Reschedule和FailoverSchedule的唯一区别?

mrlihanbo commented 2 years ago

如果A故障了,选不出新集群,保留[A, B, C]的调度结果。而我们期望的是把unjoin的集群从调度结果里移除。

明白 所以对于Reschedule,我们不考虑“从健康集群中选出相同数目的集群补充”,对吗?这也是Reschedule和FailoverSchedule的唯一区别?

有点复杂,需要结合是否配置了spread constraint来判断。这里之前写的也不完善,简单的选出相同数目的集群,其实只能满足SpreadByFieldCluster这种类型的spread constraint。

  1. 如果未使用spread constraint,Reschedule应该是选出所有符合条件的集群,并把unjoin的集群从调度结果中移除。
  2. 若使用spread constraint,重调度的结果仍需满足spread constraint约束。
Garrybest commented 2 years ago

删掉不健康集群的调度结果,保留正常集群的调度结果,然后代码直接走normal scheduling逻辑,让调度器二次调度,这样可以么?#1051中我已经把scale scheduling和normal scheduling合并了,我在想failover的逻辑是否可以进行合并?

Garrybest commented 2 years ago

When all types share the same scheduling logic, spread constraint will be concerned automatically. Now Failover has some defects, e.g., it does not take idle resource into consideration when doing rescheduling. I thought we may merge all types into one scheduling process together.

dddddai commented 2 years ago

When all types share the same scheduling logic, spread constraint will be concerned automatically. Now Failover has some defects, e.g., it does not take idle resource into consideration when doing rescheduling. I thought we may merge all types into one scheduling process together.

Agree, I think this is the best way to solve the problem

mrlihanbo commented 2 years ago

删掉不健康集群的调度结果,保留正常集群的调度结果,然后代码直接走normal scheduling逻辑,让调度器二次调度,这样可以么?#1051中我已经把scale scheduling和normal scheduling合并了,我在想failover的逻辑是否可以进行合并?

好主意,我去看下合并的pr。failover确实需要大优化。

RainbowMango commented 2 years ago

Please cc me after you made an agreement. :)

dddddai commented 2 years ago

Hi @Garrybest, I commited a new patch to merge failover schedule with normal schedule

Garrybest commented 2 years ago

Thanks @dddddai, I will check it later until #1051 get merged.

Garrybest commented 2 years ago

/assign

RainbowMango commented 2 years ago

Let's get back to this PR after #1051, is this the plan? Thanks @dddddai, I guess it will be soon. Sorry for letting this sit so long.

dddddai commented 2 years ago

Let's get back to this PR after #1051, is this the plan? Thanks @dddddai, I guess it will be soon. Sorry for letting this sit so long.

Hi @RainbowMango, I don't think there is any logical conflict between this and #1051, so the order doesn't matter Never mind, glad to help :), once this and #1051 get merged, all kinds of schedule will share the same process

RainbowMango commented 2 years ago

@dddddai Can you help resolve the conflicts?

RainbowMango commented 2 years ago

We don't have to add a merge commit. I suggest rebase.

dddddai commented 2 years ago

There is no merge commit, it's commit message :)

huone1 commented 2 years ago

@dddddai Let us discuss the PR in the weekly meeting tomorrow and the scenario “unjoining a target cluster” is a common user‘s ’behaviors

huone1 commented 2 years ago

there is nothing to requeue the rb about the cluster deleted in function deleteCluster ; I think it should add the requeue logic https://github.com/karmada-io/karmada/blob/8ceb9df2b9672d3e50b0d52d6b4e7d8c000eff81/pkg/scheduler/scheduler.go#L639-L660

dddddai commented 2 years ago

@dddddai Let us discuss the PR in the weekly meeting tomorrow and the scenario “unjoining a target cluster” is a common user‘s ’behaviors

@huone1 sorry I would be travelling in the next three days, how about discussing it in the slack channel :)

dddddai commented 2 years ago

there is nothing to requeue the rb about the cluster deleted in function deleteCluster ; I think it should add the requeue logic

There's no need to requeue rbs on delete since they are already requeued when the DeletionTimeStamp is set https://github.com/karmada-io/karmada/blob/a08dd3d7675b008e5e4137f44843d76880d454b3/pkg/scheduler/scheduler.go#L620-L624

huone1 commented 2 years ago

@dddddai Let us discuss the PR in the weekly meeting tomorrow and the scenario “unjoining a target cluster” is a common user‘s ’behaviors

@huone1 sorry I would be travelling in the next three days, how about discussing it in the slack channel :)

OK,have fun!