Closed AESwrite closed 4 years ago
/assign
Can you run kubectl get pods ${your_ca_pod} -o yaml
and check what's the value of dnsPolicy
?
Is you cluster running in us-east-2
? AWS_REGION is not required in v1.12.
From last week, samples has some clean up but I didn't see any problems there.
kubectl get pods ${your_ca_pod} -o yaml
dnsPolicy: ClusterFirst
Yes, cluster is running in us-east-2. I discovered that pod somehow uses default resolv.conf file
; generated by /usr/sbin/dhclient-script search us-east-2.compute.internal nameserver 10.0.0.2
And when i added nameserver 8.8.8.8
to it on master and worker, the CA started to work. I'm not sure if it's a solution or just a workaround (i don't think CA should use this file, because kubespray should write it's own resolv.conf, so maybe it is a kubespray problem), but now i can google some similar cases and find it out.
The problem is still relevant. Sometimes it works (in 10% of cases), but most of the time it crashes with timeout error. DNS settings and services look ok, and working on other pods perfectly.
Hi @AESwrite, sorry for late response. Anything special to your VPC settings? If you can consistently reproduce this issue, that probably a bug somewhere. I'd like to try to reproduce and fix it.
I am also getting the same error: CA version: 1.12.3 AWS EKS version: 1.12.7
I0529 12:05:53.849655 1 leaderelection.go:227] successfully renewed lease kube-system/cluster-autoscaler
I0529 12:05:55.942323 1 leaderelection.go:227] successfully renewed lease kube-system/cluster-autoscaler
E0529 12:05:56.036033 1 aws_manager.go:153] Failed to regenerate ASG cache: RequestError: send request failed
caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp 72.21.206.37:443: i/o timeout
F0529 12:05:56.036064 1 cloud_provider_builder.go:149] Failed to create AWS Manager: RequestError: send request failed
caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp 72.21.206.37:443: i/o timeout
@AESwrite @Jeffwan Can someone help here? It was working fine on EKS v1.11.5.
I am also getting the same error: CA version: 1.12.3 AWS EKS version: 1.12.7
I0529 12:05:53.849655 1 leaderelection.go:227] successfully renewed lease kube-system/cluster-autoscaler
I0529 12:05:55.942323 1 leaderelection.go:227] successfully renewed lease kube-system/cluster-autoscaler
E0529 12:05:56.036033 1 aws_manager.go:153] Failed to regenerate ASG cache: RequestError: send request failed
caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp 72.21.206.37:443: i/o timeout
F0529 12:05:56.036064 1 cloud_provider_builder.go:149] Failed to create AWS Manager: RequestError: send request failed
caused by: Post https://autoscaling.us-east-1.amazonaws.com/: dial tcp 72.21.206.37:443: i/o timeout
@AESwrite @Jeffwan Can someone help here? It was working fine on EKS v1.11.5.
check your CA pod public network accessibility
/sig aws
I was running into this problem in a cluster created by kops using a pre-existing VPC.
The route table for my subnets was the default one created by AWS. Setting the route table created by kops as main and deleting the one created by AWS solved my problem.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
I am having the same issue. We have installed CA in 4 different VPCs the 3 us-east-1 instances all work fine, the 3 in us-east-2 fails.
{"log":"E0109 14:11:31.735207 1 aws_manager.go:153] Failed to regenerate ASG cache: RequestError: send request failed\n","stream":"stderr","time":"2020-01-09T14:11:31.736567592Z"}
{"log":"caused by: Post https://autoscaling.us-east-2.amazonaws.com/: dial tcp: i/o timeout\n","stream":"stderr","time":"2020-01-09T14:11:31.736595585Z"}
{"log":"F0109 14:11:31.735237 1 cloud_provider_builder.go:149] Failed to create AWS Manager: RequestError: send request failed\n","stream":"stderr","time":"2020-01-09T14:11:31.736600881Z"}
{"log":"caused by: Post https://autoscaling.us-east-2.amazonaws.com/: dial tcp: i/o timeout\n","stream":"stderr","time":"2020-01-09T14:11:31.736605602Z"}
We have used the helm version and the standalone example for multi-asg
The EKS is built with terraform, so each one is using the same settings, besides region and VPC.
We have used 2 different accounts, VPCs are in one account while the failing one is in another. We are adding an EKS cluster to us-east-1 in the account that is currently failing to test regions. I will report back our findings.
Summary:
account-vpc | region | status |
---|---|---|
data-qa2 | us-east-2 | inconsistent |
data-qa1 | us-east-1 | inconsistent |
nonprod-qa1 | us-east-1 | success |
nonprod-test | us-east-1 | success |
nonprod-stage | us-east-1 | success |
I will report if we find anything new.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close
@fejta-bot: Closing this issue.
@Jeffwan From the above discussion I was not able to conclude what was the solution to this issue. Would it be possible for you to help me for a similar issue mentioned below:
E0323 15:19:16.485010 1 aws_manager.go:259] Failed to regenerate ASG cache: RequestError: send request failed caused by: Post https://autoscaling.us-east-2.amazonaws.com/: dial tcp: i/o timeout F0323 15:19:16.485057 1 aws_cloud_provider.go:330] Failed to create AWS Manager: RequestError: send request failed caused by: Post https://autoscaling.us-east-2.amazonaws.com/: dial tcp: i/o timeout
I am using the multi ASG deploynment for AWS CA. Versions: CA Version : k8s.gcr.io/cluster-autoscaler:v1.14.7 EKS Version : 1.14 Platform Version eks.9 coredns: v1.6.6 aws-node: amazon-k8s-cni:v1.5.5
@biswarup1290dass Em... Can you share dnsPolicy of your pod and if your coreDNS pod is running well (check logs probably)
same issue
E1210 03:15:12.303121 1 aws_manager.go:265] Failed to regenerate ASG cache: cannot autodiscover ASGs: RequestError: send request failed caused by: Post "https://autoscaling.cn-northwest-1.amazonaws.com.cn/": dial tcp 52.82.209.176:443: i/o timeout
and the cluster-autoscaler pod's dnsPolicy is ClusterFirst
Hello, I'm currently using kubernetes v1.12.5 and CA v1.12.3 Cluster was created with kubespray v.2.8.3 (kubeadm enabled). Provider: AWS
I'm using the standart example of cluster-autoscaler-one-asg.yaml, I've modified only this lines:
I get this kind of error (the same in different versions of CA):
I tried to use CA v1.3.0 on kubernetes v1.11.3 last week (the same yaml file, only different version of CA), and it worked. But today i get timeout error even on that v1.11.3 configuration (i didn't change anything in this configuration from last week).
How can i solve this issue? I will be glad to any help!
Update 1: container with autoscaler somehow can't reach internet.