robscott / kube-capacity

A simple CLI that provides an overview of the resource requests, limits, and utilization in a Kubernetes cluster
Apache License 2.0
2.09k stars 114 forks source link

Add support for node taints #91

Closed OperationalDev closed 7 months ago

OperationalDev commented 1 year ago

It would be nice if we could add support for node taints. e.g. If a node has been cordoned and is set to no schedule/execute, it should be possible to exclude this.

e.g.

kubectl get nodes
NAME
example-node-1    Ready,SchedulingDisabled   <none>          425d   v1.24.6
example-node-2    Ready                      <none>          227d   v1.24.6

kube-capacity

NODE              CPU REQUESTS    CPU LIMITS    MEMORY REQUESTS    MEMORY LIMITS
*                 560m (28%)      130m (7%)     572Mi (9%)         770Mi (13%)
example-node-1    220m (22%)      10m (1%)      192Mi (6%)         360Mi (12%)
example-node-2    340m (34%)      120m (12%)    380Mi (13%)        410Mi (14%)

Now if we exclude cordoned nodes:

kube-capacity --exclude-noschedule-nodes

NODE              CPU REQUESTS    CPU LIMITS    MEMORY REQUESTS    MEMORY LIMITS
*                 340m (34%)      132m (12%)    380Mi (13%)        410Mi (13%)
example-node-2    340m (34%)      120m (12%)    380Mi (13%)        10Mi (14%)

We can see have less capacity available than we thought.

KR411-prog commented 1 year ago

@OperationalDev .. have a query on your issue. Cordoned nodes do not schedule new pods in it. But how is removing the cordoned node from the kube-capacity report, helps in showing change in capacity ? How does request and limit change in a node by not scheduling new pods in it?

OperationalDev commented 1 year ago

If you cannot schedule pods on a node, then you cannot use that capacity. If I have 10 worker nodes, each with 2CPU available, then my capacity is 20CPU. But if 5 of the nodes are cordoned and unschedulable, then my capacity is only 10CPU, because I cannot schedule pods on the 5 nodes that have been cordoned. kube-capacity in this instance will show me I have 20CPU worth of capacity available.

In a perfect world, nodes should not be cordoned for an extended period of time, but in my scenario, nodes can be cordoned for a couple of days and this can lead to running out of capacity.

barrykp commented 1 year ago

I would like this feature as well: to be able to exclude tainted nodes.