kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.93k stars 1.46k forks source link

ALB Ingress is not getting created #2462

Closed boopathykpm closed 2 years ago

boopathykpm commented 2 years ago

Installed the ALB ingress on the EKS cluster, the chart got installed successfully, the issue is when creating the ingress of the application. I'm trying to install the sample app 2048, when doing so I'm getting error as below

Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": context deadline exceeded

I guess the IngressClass is not validating properly.

kishorj commented 2 years ago

@boopathykpm, error indicates that the k8s controlplane is not able to connect to the aws-load-balancer-controller pods running on your worker nodes. Could you ensure the cluster security group allows traffic between the k8s controlplane and the worker nodes? Please refer to issue #2460, the symptoms are the same.

boopathykpm commented 2 years ago

@kishorj Thanks for the update, It would be great if this information is captured in the documentation, I don't see this information anywhere in the documentation. Btw, I fixed it referring to the documentation of "Ingress-Nginx" chart. The same is captured in their documentation here

kishorj commented 2 years ago

/kind documentation

We need to add a section in the installation guide, for example https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.3/deploy/installation/ about the security group configuration allowing traffic between the k8s control plane and the worker nodes.

kishorj commented 2 years ago

@boopathykpm, what does your cluster create workflow look like? Did you use eksctl, or the aws clis, or some other tools.

boopathykpm commented 2 years ago

@kishorj Cluster is created using the Terraform EKS module.

gitmaniak commented 2 years ago

@boopathykpm are you using "kubernetes_ingress" resource? if so, could you try with "kubernetes_ingress_v1" (this _v1 resolved my issue)? I was getting a slightly different error but for the same webhook "vingress.elbv2.k8s.aws". Error: Failed to create Ingress 'kube-system/alb-ingress-default' because: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": dial tcp 10.0.x.x:9443: connect: connection refused

boopathykpm commented 2 years ago

@gitmaniak This is fixed, the reason for the failure is the service port number is not opened in the node security group. This piece of information is not captured anywhere in the documentation.

gitmaniak commented 2 years ago

@boopathykpm yeah, I am currently working on a requirement and what I posted above was wrong. if I enable this flag (wait_for_load_balancer = true), that is when the above mentioned POST call happens and my pod might not have 9443 port open. If I do not use that flag, the ingress gets created but the following steps need to have some other alternative to wait till the load balancer endpoint is available.

It worked after enabling the port on my node security group. thank you!

arvryna commented 2 years ago

hello, is this issue sorted ? i saw label - good-first-issue. would like to take it up, if it is open

kishorj commented 2 years ago

@arvryna, thanks for your interest, feel free to contribute to this issue.

/assign arvryna

GrigorievNick commented 2 years ago

HI @kishorj , I have same issue and I am 100% sure it's related to security group configuration like you described.

@boopathykpm, error indicates that the k8s controlplane is not able to connect to the aws-load-balancer-controller pods running on your worker nodes. Could you ensure the cluster security group allows traffic between the k8s controlplane and the worker nodes? Please refer to issue #2460, the symptoms are the same.

but what I miss, is do I need to open all ports from the control-plane to my Nodes or I can open only 443 port?

I ask because I have this issue, while I configure 443 port for ingress and egress, I can see it in AWS Console for all my Ec2 Instances.

   ingress_nodes_443 = {
      description                = "Node groups to cluster API"
      protocol                   = "tcp"
      from_port                  = 443
      to_port                    = 443
      type                       = "ingress"
      source_node_security_group = true
    }
    egress_nodes_443 = {
      description                = "Cluster API to node groups"
      protocol                   = "tcp"
      from_port                  = 443
      to_port                    = 443
      type                       = "egress"
      source_node_security_group = true
    }
kishorj commented 2 years ago

@GrigorievNick, for aws lb controller, you'd need to allow port 9443 for webhook access. Other application components might have different requirements, you'd need to figure out the optimum configuration based on your security requirements.

DZDomi commented 2 years ago

For people who are stumbling on this after upgrading the aws terraform eks module to version 18.x, the required configuration for the module to work with the load balancer controller is:

node_security_group_additional_rules = {
  ...
  ingress_allow_access_from_control_plane = {
    type                          = "ingress"
    protocol                      = "tcp"
    from_port                     = 9443
    to_port                       = 9443
    source_cluster_security_group = true
    description                   = "Allow access from control plane to webhook port of AWS load balancer controller"
  }
}
ohookins commented 2 years ago

Seems like it's also related to #2289

kishorj commented 2 years ago

documentation added in PR #2506. We will update the live docs once we release v2.4.0.

alexvaque commented 2 years ago

node_security_group_additional_rules

after checking, investigating and suddenly discover this thread , I saw your comment @DZDomi and your solution works!

It need to open the port 9443 and works

Also thanks for share the Terraform code of the EKS module :)

kuboraam commented 2 years ago

wow, 4 hours on this and finally got it to work. Thanks @DZDomi! You're a life saver. It would be nice if they can add this security group as part of the "complete" example somewhere on the module page or here.

simplicbe commented 1 year ago

Important, a very similar message might occurs, when the aws certificate is not approved yet.