terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.44k stars 4.06k forks source link

Support use of for_each/count parameters #2201

Closed ajpauwels closed 2 years ago

ajpauwels commented 2 years ago

Is your request related to a problem? Please describe.

I'm trying to automate the deployment of n number of identical clusters. It's currently impossible to do so with the EKS module as provider blocks cannot be dynamically provided to a module; they must be statically known at run-time.

Describe the solution you'd like.

When using for_each/count, could the module generate its own provider using parameters given via variables? That way, we wouldn't have to manually specify a provider for each sub-cluster.

Describe alternatives you've considered.

I've considered:

  1. Code generation - works via a bash script or terragrunt, but this involves dependency on documentation or yet another tool
  2. For now, I've gone with duplicating my cluster module 25 times, and using an if statement with count 0 or 1 to enable/disable each cluster. The problem with this is I'm capped at 25 clusters. This is fine since ipv4 subnetting isn't infinitely expandable, but I've now expanded the module to support ipv6 and this lifts such a restriction.
bryantbiggs commented 2 years ago

Hi @ajpauwels - this sounds interesting, could you share a little bit more about your use case and outcome you are trying to achieve?

ajpauwels commented 2 years ago

Sure!

I'm investigating having the ability to scale clusters in a single VPC not just vertically (+nodes), but horizontally (+clusters). This is for the following reasons:

I'm standardizing my cluster setup in a module that sets everything up according to "best practices", and I'd like to be able to instruct the module, with a simple count parameter, to deploy n number of identical clusters, all in one VPC.

I had everything setup how I like it, and it was being deployed fine, but hitting issues when trying to perform the final steps which involve issuing commands to kubernetes API (like updating the aws-auth configmap). This is because each cluster needs an independent kubernetes provider setup to talk with that cluster in particular. This is passed in via a providers block in the module, i.e.

  providers = {
    aws        = aws.cluster
    kubernetes = kubernetes.cluster0
  }

These providers cannot be dynamically generated; this is a Terraform limitation, as Terraform requires the exact providers to be known before planning phase, i.e. the for_each/count parameters are not supported in a provider block.

This effectively makes the EKS module unable to support count/for_each parameters IF the kubernetes API has to be communicated with in the end.

My follow-up questions would therefore be: is it possible to do the full deployment without requiring any communication with the k8s API afterwards? I could potentially setup a separate step that does those separately for me on the cluster group afterwards.

What I've done for now is set a static limit of 25 clusters deployed. I picked 25 clusters due to ipv4 subnetting limitations:

  1. The VPC has one /16 private IP block
  2. An AWS region has at MOST 5 availability zones (most often only 3)
  3. My clusters run one public subnet and one private subnet per AZ
  4. 5 * 2 = maximum of 10 subnets per cluster
  5. 25 * 10 = 250 /24 subnets
  6. There's an additional 1 subnet per AZ for the hosting of NAT gateways, so for up to 5 AZs, that's up to 5 more /24 subnets, covering a total of 255 potential /24 ipv4 subnets for a full cluster grouping, with one to spare

By having a static maximum, I was able to simply copy the EKS module definition to 25 different files, and add a count parameter to each one that simply determines via 0 or 1 if that cluster will be deployed. This allows me to also have 25 statically defined providers.

However, I've just recently upgraded the cluster definition to support ipv6 networking, which eliminates the subnetting limitation entirely, so I'm revisiting this issue.

I know one of the big no-nos in Terraform module creation is writing a provider block in it when the module is planned to be used as a sub-module. However, in this case, if the provider block for kubernetes (NOT aws, just kubernetes) was written in to this module, using values from both the values and perhaps passed in values (i.e. command/args for customization), then each cluster would have its own provider, and it could remain flexible enough to accomodate all use-cases.

bryantbiggs commented 2 years ago

Thank you for sharing - I don't believe there is anything we can do here in this module as its a restriction of Terraform itself. Perhaps the CDKTF might aid in this a bit better - not entirely certain since I don't have first hand experience but that is what I am told

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

schollii commented 2 years ago

yes @ajpauwels this is a current terraform limitation, you will need to generate those via a wrapper: your own (say bash, python, go), terragrunt, terramate, terraspace, or (as suggested by @bryantbiggs) CDKtf. They each have their pros and cons... which one to choose depends largely on the details of your situation.

ajpauwels commented 2 years ago

@bryantbiggs @schollii Understood, I figured as much. I may explore CDKTF in the future, but for now and for future reference to anyone else viewing this issue, I've amended my bash script to take as a parameter the number of clusters as well. So I have a gen.sh file in my directory that takes some number as its only parameter and will generate cluster01.tf through cluster<number - 1>.tf, based off the file in cluster00.tf. This works fine for my use-case.

Thank you to maintainers for their responses.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.