hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
10.35k stars 9.53k forks source link

[Enhancement]: Support node_group_name_suffix for aws_eks_node_group #41931

Open icelynjennings opened 2 months ago

icelynjennings commented 2 months ago

Description

aws_eks_node_group should support node_group_name_suffix.

Currently it only supports node_group_name and node_group_name_prefix. This is an issue when you want to change instance type(s) for your node group as terraform forces a replacement of the node group, and because node_group_name must be unique, this therefore prevents you from adding the create_before_destroy lifecycle. While it is resolved by node_group_name_prefix, our internal naming requirements disallow us from setting a randomized prefix, as resource names are being used to internally sort with regards to costs and business intelligence.

This leads to downtime (zero compute due to the node group having to be destroyed) every time you want to change instance type(s) (even if you are only adding an additional instance type to the instance_types list).

node_group_name_suffix would allow us to support create_before_destroy and therefore we could bring up a new node group prior to destroying the old one when we want to change instance_types

Affected Resource(s) and/or Data Source(s)

Potential Terraform Configuration

References

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group

Would you like to implement a fix?

Yes

github-actions[bot] commented 2 months ago

Community Guidelines

This comment is added to every new Issue to provide quick reference to how the Terraform AWS Provider is maintained. Please review the information below, and thank you for contributing to the community that keeps the provider thriving! :rocket:

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 2 months ago

Hey @icelynjennings 👋 Thank you for taking the time to raise this! This is an interesting proposal, as there are plenty of resources with a *_prefix argument, but I'm not aware of any that currently support a *_suffix argument. That said, I can definitely understand your use-case, so I'm going to leave this open for someone from the team/community to review and determine the best path forward.

In the meantime, I came up with an idea using the random_string resource that I hope will help. I set the result to an output for demonstration purposes, but you'd swap the local into the node_group_name argument of the aws_eks_node_group, with create_before_destroy set.

variable "instance_types" {
  type    = list(string)
  default = ["c5.large", "c4.large", "c3.large"]
}

variable "node_group_name_prefix" {
  default = "your-prefix"
}

resource "random_string" "example" {
  length           = 62 - length(var.node_group_name_prefix) # max length is 63, subtracting from 62 prevents off-by-one
  override_special = "-_"
  keepers = {
    instance_types = join(",", var.instance_types)
  }
}

locals {
  node_group_name = "${var.node_group_name_prefix}-${random_string.example.result}"
}

output "node_group_name" {
  value = local.node_group_name
}

Output from first apply:

$ terraform apply --auto-approve

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # random_string.example will be created
  + resource "random_string" "example" {
      + id               = (known after apply)
      + keepers          = {
          + "instance_types" = "c5.large,c4.large,c3.large"
        }
      + length           = 51
      + lower            = true
      + min_lower        = 0
      + min_numeric      = 0
      + min_special      = 0
      + min_upper        = 0
      + number           = true
      + numeric          = true
      + override_special = "-_"
      + result           = (known after apply)
      + special          = true
      + upper            = true
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + node_group_name = (known after apply)
random_string.example: Creating...
random_string.example: Creation complete after 0s [id=Dki-cEE2nHG7yQ2G4tH2jgN2dcIHz-uj6PStdVPWdNR0-VVBZlo]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

node_group_name = "your-prefix-Dki-cEE2nHG7yQ2G4tH2jgN2dcIHz-uj6PStdVPWdNR0-VVBZlo"

A second apply with no change to the instance_types variable (demonstrating stability):

$ terraform apply --auto-approve
random_string.example: Refreshing state... [id=Dki-cEE2nHG7yQ2G4tH2jgN2dcIHz-uj6PStdVPWdNR0-VVBZlo]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

node_group_name = "your-prefix-Dki-cEE2nHG7yQ2G4tH2jgN2dcIHz-uj6PStdVPWdNR0-VVBZlo"

And finally, an apply after removing c3.large from the instance_types variable:

$ terraform apply --auto-approve
random_string.example: Refreshing state... [id=Dki-cEE2nHG7yQ2G4tH2jgN2dcIHz-uj6PStdVPWdNR0-VVBZlo]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # random_string.example must be replaced
-/+ resource "random_string" "example" {
      ~ id               = "Dki-cEE2nHG7yQ2G4tH2jgN2dcIHz-uj6PStdVPWdNR0-VVBZlo" -> (known after apply)
      ~ keepers          = { # forces replacement
          ~ "instance_types" = "c5.large,c4.large,c3.large" -> "c5.large,c4.large"
        }
      ~ result           = "Dki-cEE2nHG7yQ2G4tH2jgN2dcIHz-uj6PStdVPWdNR0-VVBZlo" -> (known after apply)
        # (11 unchanged attributes hidden)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Changes to Outputs:
  ~ node_group_name = "your-prefix-Dki-cEE2nHG7yQ2G4tH2jgN2dcIHz-uj6PStdVPWdNR0-VVBZlo" -> (known after apply)
random_string.example: Destroying... [id=Dki-cEE2nHG7yQ2G4tH2jgN2dcIHz-uj6PStdVPWdNR0-VVBZlo]
random_string.example: Destruction complete after 0s
random_string.example: Creating...
random_string.example: Creation complete after 0s [id=rEErTVrA1yOElf-T7B1Ln_TOU9vDblqlowLDYxDoPGYtZm0ewc_]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

Outputs:

node_group_name = "your-prefix-rEErTVrA1yOElf-T7B1Ln_TOU9vDblqlowLDYxDoPGYtZm0ewc_"
icelynjennings commented 2 months ago

Thank you @justinretzolk for such a thorough and informative reply. I have not tried yet, but I strongly suspect your way would work.

For most orgs with stringent naming around ONLY the prefix, I can personally advise to simply:

  1. Remove any hardcoded value out of the name argument in eks_node_group.
  2. Include the same value (presumably the internal naming convention) into node_group_name_prefix instead. This way, the "left side" of the name will thus at least be as per convention.
  3. Accept that the "right side", or suffix, will be generated randomly by the upstream module, thanks to removing name and supplying node_group_name_prefix only.

Note however that this will NOT be the same as asking the tf module to generate a random prefix... in which case, the solution by @justinretzolk is currently best that I know of.