Closed sppwf closed 4 years ago
/assign @Jeffwan
em.. this should be a bug on 1.14. Let me file a PR to address this issue. It didn't consider windows resources probably.
Thank you @Jeffwan It will help me a lot
Hi @Jeffwan,
Do you think you can give an estimation when you can put the PR?
Thank
I will make it by end of week. @sppwf Please help on the test later
@sppwf I file the PR and please help review them. It takes some time to get merged to backport to 1.14. I build one test image for 1.14 with the patch, if you like to have a try seedjeffwan/cluster-autoscaler:1.14.8-dev
Hi,
I can test the image, give me couple of hours.
Thanks Sergiu Plotnicu
Hi @Jeffwan ,
It works much better now. Also, do you think in the future the fix will be applied to 1.15 Kubernetes compatible CA versions? i plan to move the EKS to 1.15.10 in the next months.
I have another issue - with resource check on windows nodes, i will open another issue on that. CA thinks about an empty node that it has 0.87 utilization (87%) even no pods on it. I might that windows host has spikes at resource usage, maybe that is the issue.
Thanks Sergiu Plotnicu
Also, do you think in the future the fix will be applied to 1.15 Kubernetes compatible CA versions? i plan to move the EKS to 1.15.10 in the next months.
I will back port to 1.15 once it's merged in master
CA thinks about an empty node that it has 0.87 utilization (87%) even no pods on it.
I am not sure what does this mean? you mean every scale down reconcile loop? Empty node get 87% which is too high and can not be marked as a scale down candidate? feel free to open a new issue for this case and share logs and node status.
Hi @Jeffwan
I have also tested the image with the patch seedjeffwan/cluster-autoscaler:1.14.8-dev and it works on my use-case
The use-case I have was when the node has less allocateable resources due to flag --kube-reserved and --system-reserved. I found this problematic when the ASG can scale to 0 with least-waste expander. In which CA will scale-out ASG but the node will have less allocateable resources than what CA thinks (due to not respecting k8s.io/cluster-autoscaler/node-template/resources/cpu tag )
@rimaulana Check my comments in https://github.com/kubernetes/autoscaler/issues/2809#issuecomment-598487660
Please vote on the solution you like
Please try following version with fixed. This can be closed. Let's track reserved resource in separate issue
/close
@Jeffwan: Closing this issue.
For those who are getting Insufficient vpc.amazonaws.com/PrivateIPv4Address
for windows ASGs with 0 nodes,
adding the following node tags has fixed the issue for me:
Explicitly specify the amount of allocatable resources:
k8s.io/cluster-autoscaler/node-template/resources/vpc.amazonaws.com/ENI 1
k8s.io/cluster-autoscaler/node-template/resources/vpc.amazonaws.com/PrivateIPv4Address 5
Tested with cluster-autoscaler v1.23
Hi Guys,
I am using a Kubernetes cluster using AWS EKS Service. Cluster version 1.14, latest AWS offers I am using Windows nodegroup with AWS supported vpc controller and webhook that adds required requests/limits to pods specs I try to scale up windows ASG from 0 to 2. The idea is that it works for a couple of days and after that, it does not. the Workaround to get it working again is to change ASG desired setting to 1, node comes up, after that Cluster autoscaler does scale down (back to 0) and up (from 0 to 2) for a couple of days.
Here is the Pod output
Also Log from Autoscaler:
ASG has proper TAGs and CA auto-discover the ASG and tried to scale the correct ASG (i even added some of the additional tags, sounds like the "resources" ones do not work :():
Thanks Sergiu Plotnicu