Scale up windows nodegroups from 0 with AWS EKS Cluster

sppwf commented 4 years ago

Hi Guys,

I am using a Kubernetes cluster using AWS EKS Service. Cluster version 1.14, latest AWS offers I am using Windows nodegroup with AWS supported vpc controller and webhook that adds required requests/limits to pods specs I try to scale up windows ASG from 0 to 2. The idea is that it works for a couple of days and after that, it does not. the Workaround to get it working again is to change ASG desired setting to 1, node comes up, after that Cluster autoscaler does scale down (back to 0) and up (from 0 to 2) for a couple of days.

Here is the Pod output

$ kubectl describe pods win64-f0hql
Name:               win64-f0hql
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             jenkins/label=windows
                    os=windows
Annotations:        kubernetes.io/psp: eks.privileged
Status:             Pending
IP:
Containers:
  jnlp:
    Image:      artifactory/win64:12.0
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:                                   7
      memory:                                16Gi
      vpc.amazonaws.com/PrivateIPv4Address:  1
    Requests:
      cpu:                                   7
      memory:                                13Gi
      vpc.amazonaws.com/PrivateIPv4Address:  1
    Mounts:
      /home/jenkins/agent from workspace-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-5qm8x (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  workspace-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=windows
                 os=windows
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason             Age                  From                Message
  ----     ------             ----                 ----                -------
  Warning  FailedScheduling   61s (x2 over 2m15s)  default-scheduler   0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory, 2 Insufficient vpc.amazonaws.com/PrivateIPv4Address, 2 node(s) didn't match node selector.
  Normal   NotTriggerScaleUp  25s                  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 max limit reached, 1 Insufficient vpc.amazonaws.com/PrivateIPv4Address
  Normal   NotTriggerScaleUp  5s (x12 over 2m7s)   cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 Insufficient vpc.amazonaws.com/PrivateIPv4Address, 1 max limit reached

Also Log from Autoscaler:

I0304 07:46:52.486607       1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"win64-f0hql", UID:"9a1e3e25-5deb-11ea-9e5a-02cbc925418a", APIVersion:"v1", ResourceVersion:"20945875", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 Insufficient vpc.amazonaws.com/PrivateIPv4Address, 1 max limit reached
I0304 07:47:02.573665       1 scale_up.go:263] Pod default/win64-f0hql is unschedulable
I0304 07:47:02.573741       1 utils.go:254] Pod win64-f0hqlcan't be scheduled on cpu-2-windows-az3-2019122009321753430000000c, predicate failed: PodFitsResources predicate mismatch, reason: Insufficient vpc.amazonaws.com/PrivateIPv4Address

ASG has proper TAGs and CA auto-discover the ASG and tried to scale the correct ASG (i even added some of the additional tags, sounds like the "resources" ones do not work :():

k8s.io/cluster-autoscaler/enabled true
k8s.io/cluster-autoscaler/node-template/label/beta.kubernetes.io/os windows
k8s.io/cluster-autoscaler/node-template/label/os windows
k8s.io/cluster-autoscaler/node-template/resources/vpc.amazonaws.com/ENI 1
k8s.io/cluster-autoscaler/node-template/resources/vpc.amazonaws.com/PrivateIPv4Address 14

Thanks Sergiu Plotnicu

Jeffwan commented 4 years ago

/assign @Jeffwan

Jeffwan commented 4 years ago

em.. this should be a bug on 1.14. Let me file a PR to address this issue. It didn't consider windows resources probably.

sppwf commented 4 years ago

Thank you @Jeffwan It will help me a lot

sppwf commented 4 years ago

Hi @Jeffwan,

Do you think you can give an estimation when you can put the PR?

Thank

Jeffwan commented 4 years ago

I will make it by end of week. @sppwf Please help on the test later

Jeffwan commented 4 years ago

@sppwf I file the PR and please help review them. It takes some time to get merged to backport to 1.14. I build one test image for 1.14 with the patch, if you like to have a try seedjeffwan/cluster-autoscaler:1.14.8-dev

sppwf commented 4 years ago

Hi,

I can test the image, give me couple of hours.

Thanks Sergiu Plotnicu

sppwf commented 4 years ago

Hi @Jeffwan ,

It works much better now. Also, do you think in the future the fix will be applied to 1.15 Kubernetes compatible CA versions? i plan to move the EKS to 1.15.10 in the next months.

I have another issue - with resource check on windows nodes, i will open another issue on that. CA thinks about an empty node that it has 0.87 utilization (87%) even no pods on it. I might that windows host has spikes at resource usage, maybe that is the issue.

Thanks Sergiu Plotnicu

Jeffwan commented 4 years ago

Also, do you think in the future the fix will be applied to 1.15 Kubernetes compatible CA versions? i plan to move the EKS to 1.15.10 in the next months.

I will back port to 1.15 once it's merged in master

CA thinks about an empty node that it has 0.87 utilization (87%) even no pods on it.

I am not sure what does this mean? you mean every scale down reconcile loop? Empty node get 87% which is too high and can not be marked as a scale down candidate? feel free to open a new issue for this case and share logs and node status.

rimaulana commented 4 years ago

Hi @Jeffwan

I have also tested the image with the patch seedjeffwan/cluster-autoscaler:1.14.8-dev and it works on my use-case

The use-case I have was when the node has less allocateable resources due to flag --kube-reserved and --system-reserved. I found this problematic when the ASG can scale to 0 with least-waste expander. In which CA will scale-out ASG but the node will have less allocateable resources than what CA thinks (due to not respecting k8s.io/cluster-autoscaler/node-template/resources/cpu tag )

Jeffwan commented 4 years ago

@rimaulana Check my comments in https://github.com/kubernetes/autoscaler/issues/2809#issuecomment-598487660

Please vote on the solution you like

Jeffwan commented 4 years ago

Please try following version with fixed. This can be closed. Let's track reserved resource in separate issue

Jeffwan commented 4 years ago

/close

k8s-ci-robot commented 4 years ago

@Jeffwan: Closing this issue.

In response to [this](https://github.com/kubernetes/autoscaler/issues/2888#issuecomment-606784496): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

vlinevych commented 1 year ago

For those who are getting Insufficient vpc.amazonaws.com/PrivateIPv4Address for windows ASGs with 0 nodes, adding the following node tags has fixed the issue for me:

Explicitly specify the amount of allocatable resources:

k8s.io/cluster-autoscaler/node-template/resources/vpc.amazonaws.com/ENI 1
k8s.io/cluster-autoscaler/node-template/resources/vpc.amazonaws.com/PrivateIPv4Address 5

Tested with cluster-autoscaler v1.23

kubernetes / autoscaler

Scale up windows nodegroups from 0 with AWS EKS Cluster #2888