karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.41k stars 871 forks source link

How to define Resource Models for a Cluster with Multiple Taints in Karmada #3869

Open kangteng525 opened 1 year ago

kangteng525 commented 1 year ago

In a typical cluster, multiple taints can exist across different nodes. Here's an example scenario:

A cluster consisting of 5000 nodes where:

In this scenario, for any given taint (for example, "taint1"), the effective node count is 1000, not 5000, meaning the available resources are effectively 1/5 of the total.

Does Karmada support the definition of resource models in such scenarios where a cluster has multiple taints? If so, could you provide guidance on how we can define the spec for this scenario?

zishen commented 1 year ago

You mean that each cluster has several distinct labels. And pod distribution depends on these labels, right?

jwcesign commented 1 year ago

Hi @kangteng525, Let's consider an example to illustrate the point. We have two clusters, cluster1 and cluster2. Cluster1 has 100CPU based on a taint toleration (200CPU in reality), while cluster2 has 200CPU based on a taint toleration (200CPU in reality). Therefore, the replicas division ratio between these two clusters should be 1:2 instead of 1:1.

Do I understand correctly?

kangteng525 commented 1 year ago

Hi @jwcesign ,

Yes, you are correct. Cluster1 has 100CPU based on taint 1 and 100CPU based on taint 2, while cluster2 has 200CPU only on taint 1.

So if scheduling workloads with tolerance on taint 1, the ratio between these 2 clusters should be 1:2, and if scheduling workloads with tolerance on taint 2, it should be 1:0.

jwcesign commented 1 year ago

Hi @kangteng525, When scheduling a workload with taints, you can install karmada-estimator to ensure that karmada-scheduler takes the taints into consideration. The estimator will calculate the resource ratio of multiple clusters while considering the taints, enabling: So if scheduling workloads with tolerance on taint 1, the ratio between these 2 clusters should be 1:2, and if scheduling workloads with tolerance on taint 2, it should be 1:0.

The related code is here: https://github.com/karmada-io/karmada/blob/09259b1f10cadb612115a5e2769760790dde6647/pkg/estimator/server/estimate.go#L42

kangteng525 commented 1 year ago

Hi @jwcesign ,

Thanks a lot! So if installed karmada-estimator, karmada-scheduler will always call estimator before binding the nodes to the target cluster? And if estimator returns not enough replicas, then the cluster will not be chosen?

And one more question, if multiple propagation policies(for example A,B,C) running at once, it seems karamda-estimator will make snapShot before estimating, what if capacity changed after A and B bind to this cluster, and C is unable to bind although estimating is passed?

Thanks, Kevin

jwcesign commented 1 year ago

Hi, @kangteng525

karmada-scheduler will always call estimator before binding the nodes to the target cluster?

Yes

And if estimator returns not enough replicas, then the cluster will not be chosen?

Yes

if multiple propagation policies(for example A,B,C) running at once, it seems karamda-estimator will make snapShot before estimating, what if capacity changed after A and B bind to this cluster, and C is unable to bind although estimating is passed?

Yes, it's possible, when some RB is scheduled and the workers are still not synced to member clusters, the scheduler may choose the same cluster(but actually the resource may not be enough after the worker is synced)

But we have application-failover, which could reschedule the pending workload to other clusters.