terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.47k stars 4.08k forks source link

EKS-Managed cluster security group attached when attach_cluster_primary_security_group=false #2687

Closed code-eg closed 1 year ago

code-eg commented 1 year ago

Description

Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration (see the examples/* directory for references that you can copy+paste and tailor to match your configs if you are unable to copy your exact configuration). The reproduction MUST be executable by running terraform init && terraform apply without any further changes.

If your request is for a new feature, please use the Feature request template.

To be upfront, I am not convinced this is a bug with this module. The problem I am facing feels more like misunderstood and/or undocumented EKS behavior relative to some of the documentation here. I wrap this module with some company-standard tooling via Helm and other K8s manifests.

In my module, I don't mess with trying to attach the EKS-managed security group. I want the minimal access of the node security group provided by this module. However when setting up a new ingress controller, I bumped into the problem described in the FAQ (https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/faq.md#i-received-an-error-expect-exactly-one-securitygroup-tagged-with-kubernetesioclustername-).

When I dug into this more, I was confused to see the EKS-managed SG was attached to the nodes (hence the multiple security groups with the redundant tag). I confirmed that the launch template only had the node SG, and I ran a TF plan for a new EKS-managed Node Group to confirm that only that one group should be attached (see attachments).

Is there some EKS-driven behavior where all EKS managed node groups get the cluster SG attached? If yes, we should consider updating the FAQ and docs a bit. If its not that behavior, anyone have any leads as to what might be causing that SG to be attached? attach_cluster_primary_security_group defaults to false, so I wouldn't expect anything to be overriding it. I noticed the same behavior when explicitly setting attach_cluster_primary_security_group to false as well

The AWS Documentation for MNG states:

By default, Amazon EKS applies the cluster security group to the instances in your node group to facilitate communication between nodes and the control plane. If you specify custom security groups in the launch template using either option mentioned earlier, Amazon EKS doesn't add the cluster security group. So, you must ensure that the inbound and outbound rules of your security groups enable communication with the endpoint of your cluster. If your security group rules are incorrect, the worker nodes can't join the cluster. For more information about security group rules, see Amazon EKS security group requirements and considerations.

I will open a support case with them to poke in the meantime.

I have attached some screenshots of the node groups, the corresponding LT and the (slightly redacted) Terraform plan of my module invocation.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

Reproduction Code [Required]

Steps to reproduce the behavior:

No Yes I ran a cluster build with `attach_cluster_primary_security_group` set to false, verified the Launch Template only has the node security group, and noticed my EKS nodes _still_ have the EKS-managed cluster SG attached. ## Expected behavior

Given the documentation listed here: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/faq.md#i-received-an-error-expect-exactly-one-securitygroup-tagged-with-kubernetesioclustername-

I would expect my node groups to spin up with only the node security group and any additional security groups specified.

Actual behavior

My EKS Managed Node Groups spin up the Launch Template (with the node SG specified) but also the EKS-created node security group

Terminal Output Screenshot(s)

See attachments at bottom for Terraform plan

Additional context

I wrap this module with my own resources, but the EKS node groups are passed in as-is

tfplan.txt Instance Details Launch Template Details

code-eg commented 1 year ago

As a semi-related issue, I am also finding that even by following the FAQ and changing the tag value of kubernetes.io/cluster/$CLUSTERNAME to something that is not "owned" (in my case it was "See Node SG"), the AWS Load Balancer Controller still can't do its thing if two SGs have a tag with that key at all.

Error:

{"level":"error","ts":"2023-07-17T15:07:28Z","msg":"Reconciler error","controller":"targetGroupBinding","controllerGroup":"elbv2.k8s.aws","controllerKind":"TargetGroupBinding","TargetGroupBinding":{"name":"k8s-projectc-contoure-d8ghjfjjfgj","namespace":"projectcontour"},"namespace":"projectcontour","name":"k8s-projectc-contoure-fdgfgfgd","reconcileID":"7ab1ca50-b3ec-43c8-b6b4-de03bcb83aa6","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/$CLUSTERNAME for eni eni-0da3dhfghgfh, got: [sg-013b5dfgfdg sg-0628b1dfgdfg] (clusterName: $CLUSTERNAME)"}

If this holds true, Options 2 and 3 from the FAQ are not valid options, and in my case I would be forced to only use Option 1 if the EKS-managed SG remains attached

code-eg commented 1 year ago

After talking with AWS Support they appropriately called out that this is using the EKS provided Launch Template which would add in the cluster SG. Its possible I missed a flag update in a version upgrade and it just happened to still work. I will investigate why its not passing in the Launch template into the MNG.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.