kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.94k stars 1.46k forks source link

AWS alb-ingress-controller failed to create ALB in EKS with fargate #1202

Closed aspnet4you closed 4 years ago

aspnet4you commented 4 years ago

I was trying to follow the documentation below to create an alb-ingress-controller with ingress resources- https://aws.amazon.com/blogs/containers/using-alb-ingress-controller-with-amazon-eks-on-fargate/

It's supposed to create an alb and bind the address field of Kubernetes ingress but the address field of ingress is empty! No error. Fargate profile has been given proper IAM permissions and service account is given RBAC based on the documentation.

I documented the steps in my blog with screenshots at https://blogs.aspnet4you.com/2020/03/17/run-serverless-kubernetes-pods-using-amazon-eks-and-aws-fargate/ and you can see address of ingress is empty! Ingress PODs are running fine.

I could create an alb manually which is what I did but it defeats the purpose. Any idea why alb didn't get created?

Thanks, Prodip

M00nF1sh commented 4 years ago

Hi, would you help share the logs from the controller pod?

BTW, where is your controller running? if it's running as a fargate pod itself, you need to specify --aws-vpc-id and --aws-region

aspnet4you commented 4 years ago

@M00nF1sh, Thank you for responding to my question. Unfortunately, I didn't check the logs in the ingress controller before deleting the eks cluster. Any suggestion before I retry eks fargate with alb?

The ingress controllers (pods) were running in kube-system namespace. I did specify was-vpc-id and aws-region in the deployment yaml. For this pic, I didn't have any node group, just a fargate profile. Here is my ingress yaml, https://raw.githubusercontent.com/aspnet4you/eks-fargate-poc/master/alb-ingress-controller.yaml

M00nF1sh commented 4 years ago

@aspnet4you Pure Fargate(without any node group) should works fine. (i tested v1.1.4 which you are using works fine). One tip is change v1.1.4 to v1.1.6 for latest code(but none of these fixes is related to your issue).

From the controller-log, you should see what's wrong, typically it's iam permission or a subnet misttaged.

aspnet4you commented 4 years ago

@M00nF1sh, Thanks for the suggestion. I will try the latest version.

I was overly cautious on subnet tags and both the public and private pairs were tagged correctly. Learned that from previous poc with eks and ec2! Matter of fact, eksctl tool did that for me with security groups wide open to all traffic all ports!

aspnet4you commented 4 years ago

@M00nF1sh , Below is what I see in the logs and no ALB! Can't make anything out of the logs. What can possibly go wrong? I downloaded the latest IAM policy from github.

kubectl logs -p alb-ingress-controller-5db898488b-bqrf6 -n kube-system


AWS ALB Ingress controller Release: v1.1.6 Build: git-95ee2ac8 Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git

W0324 00:42:59.659618 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. E0324 00:43:29.660449 1 manager.go:173] kubebuilder/manager "msg"="Failed to get API Group-Resources" "error"="Get https://10.100.0.1:443/api?timeout=32s: dial tcp 10.100.0.1:443: i/o timeout"
F0324 00:43:29.660488 1 main.go:84] Get https://10.100.0.1:443/api?timeout=32s: dial tcp 10.100.0.1:443: i/o timeout

image

Thanks, Prodip

aspnet4you commented 4 years ago

@M00nF1sh : More logs.. see the attached file for formatted logs. kubectl logs -f alb-ingress-controller-5db898488b-bqrf6 -n kube-system


AWS ALB Ingress controller Release: v1.1.6 Build: git-95ee2ac8 Repository: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.git

W0324 00:43:30.859177 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0324 00:43:30.970685 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}} } I0324 00:43:30.970902 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}, "spec":{},"status":{"loadBalancer":{}}}} I0324 00:43:30.970963 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"= I0324 00:43:30.971098 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}, "spec":{},"status":{"loadBalancer":{}}}} I0324 00:43:30.971131 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"= I0324 00:43:30.971266 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}} } I0324 00:43:30.971574 1 controller.go:121] kubebuilder/controller "level"=0 "msg"="Starting EventSource" "controller"="alb-ingress-controller" "source"={"Type":{"metadata":{"creationTimestamp":null}, "spec":{},"status":{"daemonEndpoints":{"kubeletEndpoint":{"Port":0}},"nodeInfo":{"machineID":"","systemUUID":"","bootID":"","kernelVersion":"","osImage":"","containerRuntimeVersion":"","kubeletVersion":""," kubeProxyVersion":"","operatingSystem":"","architecture":""}}}} I0324 00:43:31.044029 1 leaderelection.go:205] attempting to acquire leader lease kube-system/ingress-controller-leader-alb... I0324 00:43:31.057484 1 leaderelection.go:214] successfully acquired lease kube-system/ingress-controller-leader-alb I0324 00:43:31.057674 1 recorder.go:53] kubebuilder/manager/events "level"=1 "msg"="Normal" "message"="alb-ingress-controller-5db898488b-bqrf6_7bd33a30-6d68-11ea-994e-7290c1c88576 became leader" "obj ect"={"kind":"ConfigMap","namespace":"kube-system","name":"ingress-controller-leader-alb","uid":"7bdf9bad-6d68-11ea-8108-0a9dec12172d","apiVersion":"v1","resourceVersion":"4864"} "reason"="LeaderElection" I0324 00:43:31.158073 1 controller.go:134] kubebuilder/controller "level"=0 "msg"="Starting Controller" "controller"="alb-ingress-controller" I0324 00:43:31.258364 1 controller.go:154] kubebuilder/controller "level"=0 "msg"="Starting workers" "controller"="alb-ingress-controller" "worker count"=1 W0324 00:51:50.249271 1 reflector.go:270] pkg/mod/k8s.io/client-go@v0.0.0-20181213151034-8d9ed539ba31/tools/cache/reflector.go:95: watch of *v1.Secret ended with: too old resource version: 3846 (6237) E0324 01:26:23.432194 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="no object matching key \"default/aspnetapp-ingress\" in local store" "controller"="alb-ingress-cont roller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:27:10.067226 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:27:56.072913 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:28:47.180817 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:29:37.624242 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:30:13.205391 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:30:51.391739 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:31:32.773034 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:32:21.837140 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:33:12.075720 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:33:50.826910 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:34:37.774838 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:35:28.136156 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:36:28.244697 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:38:00.702172 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:40:14.999806 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:43:45.026893 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:44:34.766833 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"} E0324 01:50:01.950738 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to build LoadBalancer configuration due to unable to fetch subnets. Error: WebIdentityErr: fa iled to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post https://sts.'us-east-1'.amazonaws.com/: dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller "="alb-ingress-controller" "request"={"Namespace":"default","Name":"aspnetapp-ingress"}

alb-ingress-controller-error.txt

aspnet4you commented 4 years ago

Private subnet tagged by eksctl, looks fine to me- image

Public subnet tagged by eksctl, looks fine to me- image

aspnet4you commented 4 years ago

Hi @M00nF1sh, Any idea what may be wrong with my configuration? Looks like EKS Farget in not mature enough for production when it comes to ingress!

Thanks, Prodip

M00nF1sh commented 4 years ago

@aspnet4you apparently the real cause of your issue is dial tcp: lookup sts.'us-east-1'.amazonaws.com: no such host" "controller, did your VPC have an internet GW or nat GW? Note: even with Fargate, the internet requests for your pods will still use your VPC(we dropped a ENI in your vpc)

M00nF1sh commented 4 years ago

also, specify these settings without the quote:

- --cluster-name='eks-fargate-alb-ingress-demo'
 - --aws-vpc-id='vpc-057af016ed6507b52'
- --aws-region='us-east-1'

to

- --cluster-name=eks-fargate-alb-ingress-demo
- --aws-vpc-id=vpc-057af016ed6507b52
- --aws-region=us-east-1

You can see the error message of sts.'us-east-1'.amazonaws.com, where even region is quoted

aspnet4you commented 4 years ago

@M00nF1sh, You are smart. 💯 That was it! I removed the quotes and alb provisioned as designed. You can close the issue.

I liked how alb auto adjusts the target backed. I changed the scaleset from 2 to 3 pods and I can see new IP is auto added to the target. Nice. :)- This is the reason I didn't want to add alb manually and deal with the auto scaling.

Here is my ingress definition: image

Ingress resource definition: image

Thanks, Prodip

M00nF1sh commented 4 years ago

cool, glad it works :D

zquintana commented 3 years ago

I have the exact same issue, I can't figure out what's causing it.

Pod Logs:

{"level":"error","ts":1627148976.691803,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:34703->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149158.8136048,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:52341->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149331.7705815,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:58778->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149528.279761,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:55073->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149707.0748882,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:48301->172.20.0.10:53: read: connection refused"}

Container args:

Args:
      --cluster-name=app-rylqFOXa
      --ingress-class=alb
      --aws-region=us-west-2
      --aws-vpc-id=vpc-0e200d3ae7e12447c

Role policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::203341958641:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/2917B2CCF25A5DC470EF1CF5DB059AE9"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-west-2.amazonaws.com/id/2917B2CCF25A5DC470EF1CF5DB059AE9:sub": "system:serviceaccount:kube-system:aws-load-balancer-controller"
        }
      }
    }
  ]
}

The public subnets tagged with:

kubernetes.io/role/elb  1
kubernetes.io/cluster/app-rylqFOXa  shared

Private are basically the same, but with elb-internal. I'm trying to try out fargate as a POC for work. What might I be missing here?

aspnet4you commented 3 years ago

@zquintana, Your issue is little different than what I was facing. Your controller definition looks ok.

Do you want to double check your vpc subnet tags for private subnet? As per documentation, it should be internal-elb and not elb-internal. https://aws.amazon.com/premiumsupport/knowledge-center/eks-vpc-subnet-discovery/

Key: kubernetes.io/role/internal-elb Value: 1

Things may have changed a bit since I performed the poc. I have all the supporting files in github.com and entrypoint is https://github.com/aspnet4you/eks-fargate-poc/blob/master/eks-fargate-alb-ingress-v2.ps1

zquintana commented 3 years ago

@aspnet4you , yea looks like it's internal-elb. Typo. I'm using the official AWS helm chart.

zquintana commented 3 years ago

Turns out my issue was this https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1360, core dns wasn't setup for fargate only cluster.