Closed boopathykpm closed 2 years ago
@boopathykpm, error indicates that the k8s controlplane is not able to connect to the aws-load-balancer-controller pods running on your worker nodes. Could you ensure the cluster security group allows traffic between the k8s controlplane and the worker nodes? Please refer to issue #2460, the symptoms are the same.
@kishorj Thanks for the update, It would be great if this information is captured in the documentation, I don't see this information anywhere in the documentation. Btw, I fixed it referring to the documentation of "Ingress-Nginx" chart. The same is captured in their documentation here
/kind documentation
We need to add a section in the installation guide, for example https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.3/deploy/installation/ about the security group configuration allowing traffic between the k8s control plane and the worker nodes.
@boopathykpm, what does your cluster create workflow look like? Did you use eksctl, or the aws clis, or some other tools.
@kishorj Cluster is created using the Terraform EKS module.
@boopathykpm are you using "kubernetes_ingress" resource? if so, could you try with "kubernetes_ingress_v1" (this _v1 resolved my issue)? I was getting a slightly different error but for the same webhook "vingress.elbv2.k8s.aws". Error: Failed to create Ingress 'kube-system/alb-ingress-default' because: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": dial tcp 10.0.x.x:9443: connect: connection refused
@gitmaniak This is fixed, the reason for the failure is the service port number is not opened in the node security group. This piece of information is not captured anywhere in the documentation.
@boopathykpm yeah, I am currently working on a requirement and what I posted above was wrong. if I enable this flag (wait_for_load_balancer = true), that is when the above mentioned POST call happens and my pod might not have 9443 port open. If I do not use that flag, the ingress gets created but the following steps need to have some other alternative to wait till the load balancer endpoint is available.
It worked after enabling the port on my node security group. thank you!
hello, is this issue sorted ? i saw label - good-first-issue. would like to take it up, if it is open
@arvryna, thanks for your interest, feel free to contribute to this issue.
/assign arvryna
HI @kishorj , I have same issue and I am 100% sure it's related to security group configuration like you described.
@boopathykpm, error indicates that the k8s controlplane is not able to connect to the aws-load-balancer-controller pods running on your worker nodes. Could you ensure the cluster security group allows traffic between the k8s controlplane and the worker nodes? Please refer to issue #2460, the symptoms are the same.
but what I miss, is do I need to open all ports from the control-plane to my Nodes or I can open only 443 port?
I ask because I have this issue, while I configure 443 port for ingress and egress, I can see it in AWS Console for all my Ec2 Instances.
ingress_nodes_443 = {
description = "Node groups to cluster API"
protocol = "tcp"
from_port = 443
to_port = 443
type = "ingress"
source_node_security_group = true
}
egress_nodes_443 = {
description = "Cluster API to node groups"
protocol = "tcp"
from_port = 443
to_port = 443
type = "egress"
source_node_security_group = true
}
@GrigorievNick, for aws lb controller, you'd need to allow port 9443 for webhook access. Other application components might have different requirements, you'd need to figure out the optimum configuration based on your security requirements.
For people who are stumbling on this after upgrading the aws terraform eks module to version 18.x
, the required configuration for the module to work with the load balancer controller is:
node_security_group_additional_rules = {
...
ingress_allow_access_from_control_plane = {
type = "ingress"
protocol = "tcp"
from_port = 9443
to_port = 9443
source_cluster_security_group = true
description = "Allow access from control plane to webhook port of AWS load balancer controller"
}
}
Seems like it's also related to #2289
documentation added in PR #2506. We will update the live docs once we release v2.4.0.
node_security_group_additional_rules
after checking, investigating and suddenly discover this thread , I saw your comment @DZDomi and your solution works!
It need to open the port 9443 and works
Also thanks for share the Terraform code of the EKS module :)
wow, 4 hours on this and finally got it to work. Thanks @DZDomi! You're a life saver. It would be nice if they can add this security group as part of the "complete" example somewhere on the module page or here.
Important, a very similar message might occurs, when the aws certificate is not approved yet.
Installed the ALB ingress on the EKS cluster, the chart got installed successfully, the issue is when creating the ingress of the application. I'm trying to install the sample app 2048, when doing so I'm getting error as below
I guess the IngressClass is not validating properly.