loft-sh / vcluster

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it offers better multi-tenancy and isolation than regular namespaces.
https://www.vcluster.com
Apache License 2.0
6.26k stars 398 forks source link

Allow to set tolerations for vcluster pods #330

Closed mixitgit closed 2 years ago

mixitgit commented 2 years ago

Why?

Currently it is possible to set node selector to ensure that all pods created in vcluster will be scheduled on some subset of nodes. But in scenarios, when there are multiple vclusters, some of which use fake nodes and some of which should use dedicated nodes, it doesn't guarantee that pods from vclusters that use fake nodes won't schedule on dedicated nodes.

How?

This could be solved by setting tolerations on all pods in vclusters that use dedicated nodes and corresponding taints on these nodes. So in couple with node selector we can guarantee that pods will be scheduled on some node if and only if they belong to some certain vcluster. This could be done either by allowing to set tolerations in some separate flag, or just via --enforce-node-selector, because there is no case when tolerations and node selector should differ in terms of dedicated nodes

FabianKramm commented 2 years ago

@mixitgit thanks for creating this issue! Not sure if I understand this correctly, but you want that vcluster sets tolerations to some pods automatically that are synced between vcluster and the host cluster? Currently you can already set tolerations on pods within the vcluster that are then synced to the host cluster. Since scheduling happens in the host cluster those tolerations will be considered during scheduling.

mixitgit commented 2 years ago

I want to set tolerations automatically, this would allow to create dedicated nodes for vclusters For example: If we set --enforce-node-selector --node-selector=foo=bar Currently it will automatically add a node selector to all pods:

nodeSelector:
    worker.kaas.sbrf.ru/vcluster: sbl-d-01

If we would also add a toleration:

tolerations:
- key: "foo"
  operator: "Equal"
  value: "bar"
  effect: "NoSchedule"

We then could create a node with label foo=bar and a taint foo=bar NoSchedule, and this node will be dedicated to this vcluster (no other vclusters will schedule their pods there) This is extremely helpful when multiple teams sit on the same host clusters and some of them should have their own dedicated nodes This toleration won't affect other use cases (unless someone will want to label his node and at the same time have same taint on it, so the pods won't be able to schedule there, but such case seems a bit weird)

FabianKramm commented 2 years ago

@mixitgit thanks for the explanation! Ah I see, so you want to dedicate the node completely to a vcluster, that makes sense. Yes I guess we could add a flag for this.

mixitgit commented 2 years ago

Maybe it would be more convenient to embed this functionality to enforce-node-selector flag? Because it's actually kind of enforces this nodes to be binded to the cluster. Otherwise it would be a little hard to set it through a flag, because it actually has 4 fields (key, operator, value, effect), so its either should be biased (i.e. always set operator="Equal", effect="NoSchedule"), or might be hard to understand. I am not sure though

FabianKramm commented 2 years ago

@mixitgit there is an official notation for tolerations, like this key1=value1:NoSchedule, so I don't think this will be a problem see Kubernetes docs. In general I guess we shouldn't mix those with enforce-node-selector and rather do a new flag to avoid confusions.

mixitgit commented 2 years ago

@FabianKramm ok, sure, with this notation it seems perfect

FabianKramm commented 2 years ago

Closing as #347 was merged