wbuchwalter / Kubernetes-acs-engine-autoscaler

[Deprecated] Node-level autoscaler for Kubernetes clusters created with acs-engine.
Other
71 stars 22 forks source link

state: under-utilized-undrainable #54

Closed skuda closed 7 years ago

skuda commented 7 years ago

Hi,

I found this state while testing today:

[calling-eagle-acs-engine-autoscaler-2908112416-p9vg5] 2017-09-24 14:54:23,480 - autoscaler.engine_scaler - INFO - ++++ Maintaining Nodes ++++++ 
[calling-eagle-acs-engine-autoscaler-2908112416-p9vg5] 2017-09-24 14:54:23,481 - autoscaler.engine_scaler - INFO - node: k8s-agentpool1-42737137-0                                                   state: busy 
[calling-eagle-acs-engine-autoscaler-2908112416-p9vg5] 2017-09-24 14:54:23,484 - autoscaler.engine_scaler - INFO - node: k8s-agentpool1-42737137-1                                                   state: busy 
[calling-eagle-acs-engine-autoscaler-2908112416-p9vg5] 2017-09-24 14:54:23,484 - autoscaler.engine_scaler - INFO - node: k8s-agentpool1-42737137-2                                                   state: under-utilized-undrainable 
[calling-eagle-acs-engine-autoscaler-2908112416-p9vg5] 2017-09-24 14:54:23,485 - autoscaler.engine_scaler - INFO - node: k8s-agentpool1-42737137-3                                                   state: under-utilized-undrainable 
[calling-eagle-acs-engine-autoscaler-2908112416-p9vg5] 2017-09-24 14:54:23,485 - autoscaler.engine_scaler - INFO - node: k8s-agentpool1-42737137-4                                                   state: under-utilized-undrainable 
[calling-eagle-acs-engine-autoscaler-2908112416-p9vg5] 2017-09-24 14:54:23,485 - autoscaler.cluster - INFO - ++++ Maintenance Ends ++++++ 

Right now all the load could be handled by 2 nodes like autoscaler correctly detects, but it doesn't cordon and drain the unneeded nodes. I just tried to do it manually myself:

$ kubectl cordon k8s-agentpool1-42737137-4
node "k8s-agentpool1-42737137-4" cordoned

$ kubectl drain k8s-agentpool1-42737137-4
node "k8s-agentpool1-42737137-4" already cordoned
error: DaemonSet-managed pods (use --ignore-daemonsets to ignore): kube-proxy-hcfkz

$ kubectl drain --ignore-daemonsets k8s-agentpool1-42737137-4
node "k8s-agentpool1-42737137-4" already cordoned
WARNING: Ignoring DaemonSet-managed pods: kube-proxy-hcfkz
pod "calling-eagle-acs-engine-autoscaler-2908112416-p9vg5" evicted
node "k8s-agentpool1-42737137-4" drained

is the reason for that "under-utilized-undrainable" that the nodes had running the kube-proxy daemonset? Thanks!

Miguel.

wbuchwalter commented 7 years ago

the autoscaler was running on this node it seems, as there was a pod named calling-eagle-acs-engine-autoscaler-2908112416-p9vg5 that got evicted from the node. That's why the node was marked as undrainable. Any user/system pod that is not replicated (except kube-proxy) will cause the node it's on to be marked undrainable to avoid disruptions in your cluster.

skuda commented 7 years ago

ahh ok, that explains the blocking of the other two nodes, they were running pods without replication too. Thanks!