Closed kevinchiu-mlse closed 4 months ago
confirmed this issue is present with EKS 1.29 and latest terraform-aws-eks v20.14.0
AWS EKS support took a look and confirmed EKS/VPC config is correct and they can replicate the issue on their end. No additional details available at this time, will update when more info is provided.
closing since this is not related to the module
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Description
I am trying to run two EKS clusters in one VPC with shared private and public subnets. Both EKS clusters are created with
terraform-aws-modules/eks/aws
v20.13.1. The EKS clusters each have two managed node groups running standard EKS AMIs. The clusters are small with minimal pods, there is no ip exhaustion issue.On the second cluster created, ALB ingress will fail reaching the target pods. The first cluster launched has working ingress.
I am using ALB ingress, however the ALB health check fails to reach the running pods on cluster 2. If I manually add a common security group to the node group per cluster and use ALB annotations to attach the same security group ALB health check to the pods succeeds.
Last year when migrating from v19 to v20 of EKS blueprints, I was able have working ingress with one cluster launched with the v19 and the newer cluster launched with v20 without any additional work arounds.
⚠️ Note
Versions
Module version [Required]: v20.13.1
Terraform version: 1.7.5
Provider version(s):
provider registry.terraform.io/gavinbunney/kubectl v1.14.0
provider registry.terraform.io/hashicorp/aws v5.42.0
provider registry.terraform.io/hashicorp/cloudinit v2.3.3
provider registry.terraform.io/hashicorp/helm v2.12.1
provider registry.terraform.io/hashicorp/kubernetes v2.27.0
provider registry.terraform.io/hashicorp/null v3.2.2
provider registry.terraform.io/hashicorp/time v0.11.1
provider registry.terraform.io/hashicorp/tls v4.0.5
Reproduction Code [Required]
Steps to reproduce the behavior:
Expected behavior
On cluster 2, ALB ingress returns 200 ok and serves the application.
Actual behavior
on cluster 2, ALB returns 504 gateway time out, check ALB resources in AWS console and the target pods are not reachable
Terminal Output Screenshot(s)
Additional context
kubernetes.io/role/internal-elb
andkubernetes.io/role/elb
tags