Closed huone1 closed 2 years ago
/cc @dddddai
/cc @mrlihanbo
notes: This PR does not consider the spreadconstraints. /lgtm
Please solve the conflict. @huone1
issue #1411 is also fixed in the PR
Please solve the conflict. @huone1
ok, it is done
cc @dddddai @Garrybest Can you take a look?
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: RainbowMango
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/assign
/lgtm
I think there is a bug here. Imagine:
Failover
is disabled.We should not erase the scheduling replicas of the not-ready cluster in RB because Failover
is disabled. However, if it happens to scale up, e.g. the user scale up desired replicas. This PR will delete all replicas in this cluster which is dangerous. PTAL, @huone1
when Failover is disabled, the rescheduling for not-ready cluster is not triggered; if A cluster is not ready and unhealthy, i think it is normal to delete all replicas in this cluster
I'm afraid not. When scaling up, rescheduling will be triggered.
I'm afraid not. When scaling up, rescheduling will be triggered.
what is wrong with it
However, if it happens to scale up, e.g. the user scale up desired replicas. This PR will delete all replicas in this cluster which is dangerous.
Why is it dangerous?
What I'm thinking about is how to postpone the deletion operation until the desired replicas are all in the available state. That guarantees there are always sufficient replicas running at any time.
Failover
is disabled, but under this circumstance all replicas will be removed. It does not match the expectation.
Failover
is disabled because the user does not want to remove all replicas when a cluster is not ready. If the api-server of a member cluster is temporarily down and Failover
is disabled, we potentially do not want the replicas in this member cluster to be evicted when the user triggers scaling up. However, now it does not match the expectation.
Agree with @Garrybest.
We should consider the false positive
cluster failure seriously
. Can we hold the replicas(for the un-healthy cluster) unchanged even in the case of scaling up and decreasing the replicas in the case of scaling down.?
it is different between deleting a cluster and the change from healthy to unhealthy。it is reasonable to migrate the replicas to Other Clusters when deleting a cluster
Yeah, deleting a cluster is another story.
Let me describe an example.
Failover
, the scaling up triggers the scheduling procedure, now the karmada-scheduler will remove replicas in cluster A which may be dangerous. However, it is possible that there is nothing wrong with the 5 replicas in member cluster A.Thanks @Garrybest for the details. Can you help file an issue to track this?
Sure.
Signed-off-by: huone1 huwanxing@huawei.com
What type of PR is this?
/kind feature
What this PR does / why we need it: support rescheduling when deleting a cluster Which issue(s) this PR fixes: Fixes #829 Fixes #1411
Special notes for your reviewer: This PR does not consider the spreadconstraints and it will be considered in the scheduler refactoring . Does this PR introduce a user-facing change?: