terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources πŸ‡ΊπŸ‡¦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.48k stars 4.09k forks source link

Terraform shows false changes in user_data #3227

Open anzinchenko89 opened 21 hours ago

anzinchenko89 commented 21 hours ago

Description

We are using EKS managed node group and AL2023. Durning the repository TF code changes that are not even related to the node groups and user_data, terraform always shows as user_data is being updated in place, but nothing has changed and it makes it appear like the Terraform is going to update the node group but it's false changes.

Versions

Reproduction Code [Required]

module "nodegroup" {
  source   = "./node_group"
  for_each = local.worker_node_groups

  cluster_name              = var.cluster_name
  eks_cluster_name          = time_sleep.cluster.triggers["cluster_name"]
  cluster_endpoint          = time_sleep.cluster.triggers["cluster_endpoint"]
  cluster_auth_base64       = time_sleep.cluster.triggers["cluster_certificate_authority_data"]
  cluster_service_ipv4_cidr = var.eks_cluster_service_ipv4_cidr
  instance_type             = each.value.instance_type
  max_nodes                 = each.value.max_nodes
  min_nodes                 = each.value.min_nodes
  name                      = each.value.name

  block_device_mappings = {
    # Root volume
    xvda = {
      device_name = "/dev/xvda"
      ebs = {
        volume_size           = 50
        volume_type           = "gp3"
        iops                  = 3000
        throughput            = 125
        delete_on_termination = true
        encrypted             = true
      }
    }
    xvdb = {
      device_name = local.second_volume_name
      ebs = {
        volume_size           = each.value.root_volume_size
        volume_type           = "gp3"
        iops                  = 3000
        throughput            = 125
        delete_on_termination = true
        encrypted             = true
      }
    }
  }
  security_groups            = each.value.security_groups
  subnet_ids                 = var.worker_subnet_ids
  eks_worker_arn             = each.value.eks_worker_arn
  eks_node_group_ami_id      = var.eks_node_group_ami_id
  enable_bootstrap_user_data = true
  ami_type                   = "AL2023_x86_64_STANDARD"
  cloudinit_pre_nodeadm = [{
    content_type = "text/x-shellscript; charset=\"us-ascii\""
    content      = <<-EOT
      #!/bin/bash
      # This user data mounts the containerd directories to the second EBS volume which
      # It's being used pretty long bash script, so just to provide an example of the code structure
    EOT
    },
    {
      content_type = "application/node.eks.aws"
      content      = <<-EOT
        ---
        apiVersion: node.eks.aws/v1alpha1
        kind: NodeConfig
        spec:
          kubelet:
            config:
              registerWithTaints:
                - key: "ebs.csi.aws.com/agent-not-ready"
                  effect: "NoExecute"
                  value: "NoExecute"
                - key: "efs.csi.aws.com/agent-not-ready"
                  effect: "NoExecute"
                  value: "NoExecute"
              evictionHard:
                memory.available: "100Mi"
                nodefs.available: "10%"
                nodefs.inodesFree: "5%"
                imagefs.available: "15%"
                imagefs.inodesFree: "5%"
              evictionSoft:
                nodefs.available: "15%"
                nodefs.inodesFree: "10%"
                imagefs.available: "20%"
                imagefs.inodesFree: "10%"
              evictionSoftGracePeriod:
                nodefs.available: 60s
                nodefs.inodesFree: 60s
                imagefs.available: 60s
                imagefs.inodesFree: 60s
              evictionMaxPodGracePeriod: 180
              evictionPressureTransitionPeriod: 5m
              evictionMinimumReclaim:
                nodefs.available: 1Gi
                imagefs.available: 1Gi
      EOT
  }]
  aws_tags     = merge(var.aws_tags, each.value["tags"])
  default_tags = var.default_tags
  labels       = each.value.node_labels
  taints       = each.value.node_taints

  depends_on = [
    aws_eks_cluster.cluster
  ]
}

The module node_group is located within the our local repo and contains the following:

module "user_data" {
  source                     = "terraform-aws-modules/eks/aws//modules/_user_data"
  version                    = "~> 20.0"
  create                     = true
  ami_type                   = var.ami_type
  cluster_name               = var.eks_cluster_name
  cluster_endpoint           = var.cluster_endpoint
  cluster_auth_base64        = var.cluster_auth_base64
  cluster_service_ipv4_cidr  = var.cluster_service_ipv4_cidr
  enable_bootstrap_user_data = var.enable_bootstrap_user_data
  pre_bootstrap_user_data    = var.pre_bootstrap_user_data
  post_bootstrap_user_data   = var.post_bootstrap_user_data
  bootstrap_extra_args       = var.bootstrap_extra_args
  user_data_template_path    = var.user_data_template_path
  cloudinit_pre_nodeadm      = var.cloudinit_pre_nodeadm
}
resource "aws_launch_template" "workers" {
  name_prefix   = "${var.name}.${var.cluster_name}-"
  image_id      = var.eks_node_group_ami_id
  instance_type = var.instance_type
  ebs_optimized = true
  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 2
  }
  monitoring {
    enabled = false
  }
  network_interfaces {
    device_index          = 0
    security_groups       = var.security_groups
    delete_on_termination = true
  }
  user_data = module.user_data.user_data
  dynamic "tag_specifications" {
    for_each = toset(var.tag_specifications)
    content {
      resource_type = tag_specifications.key
      tags = merge(
        var.aws_tags,
        var.default_tags,
        {
          Name = "${var.name}.${var.cluster_name}"
        }
      )
    }
  }
  tags = merge(
    var.aws_tags,
    {
      Name = "${var.name}.${var.cluster_name}"
    }
  )
  dynamic "block_device_mappings" {
    for_each = var.block_device_mappings

    content {
      device_name = try(block_device_mappings.value.device_name, null)

      dynamic "ebs" {
        for_each = try([block_device_mappings.value.ebs], [])

        content {
          delete_on_termination = try(ebs.value.delete_on_termination, null)
          encrypted             = try(ebs.value.encrypted, null)
          iops                  = try(ebs.value.iops, null)
          kms_key_id            = try(ebs.value.kms_key_id, null)
          snapshot_id           = try(ebs.value.snapshot_id, null)
          throughput            = try(ebs.value.throughput, null)
          volume_size           = try(ebs.value.volume_size, null)
          volume_type           = try(ebs.value.volume_type, null)
        }
      }

      no_device    = try(block_device_mappings.value.no_device, null)
      virtual_name = try(block_device_mappings.value.virtual_name, null)
    }
  }
}

Steps to reproduce the behavior:

Even after adding a new TF resources or changing any different the piece of code not related to the launch_template, user_data, it causes the "changes" in user_data

# module.cluster.module.nodegroup["default"].aws_eks_node_group.workers will be updated in-place
  ~ resource "aws_eks_node_group" "workers" {
        id                     = "###########"
        tags                   = {
            "Name"                                                = "eks_cluster.net"
            "k8s.io/cluster-autoscaler/eks_cluster" = "owned"
            "k8s.io/cluster-autoscaler/enabled"                   = "true"
        }
      ~ launch_template {
            id      = "lt-085883b0718ea3681"
            name    = "default.eks_cluster.net-2024060614060058700000002a"
          ~ version = "24" -> (known after apply)
        }

        # (3 unchanged blocks hidden)
    }
  ~ resource "aws_launch_template" "workers" {
        id                                   = "lt-085883b0718ea3681"
      ~ latest_version                       = 24 -> (known after apply)
        name                                 = "default.eks_cluster.net-2024060614060058700000002a"
        tags                                 = {
            "Name" = "default.eks_cluster.net"
        }
      ~ user_data                            = "Q29udGVudC1UeXBlOiBtdWx0aXBhcnQvbWl4ZWQ7IGJvdW5kYXJ5PSJNSU1FQk9VTkRBUlkiCk1JTUUtVmVyc2lvbjogMS4wDQoNCi0tTUlNRUJPVU5EQVJZDQpDb250ZW50LVRyYW5zZmVyLUVuY29kaW5nOiA3Yml0DQpDb250ZW50LVR5cGU6IHRleHQveC..........

After applying this plan, the launch template has not been updating and latest version is being used still the same (in this particular case launch template version remains as 24)

Expected behavior

After adding changes not related to user_data, launch_template, node_groups the terraform shouldn't consider the user_data to be updated-in place

Actual behavior

Terraform always detect the user_data drift even if the changes in a repository were applied to the resources not related to user_data, launch_template, eks node_groups.

bryantbiggs commented 17 hours ago

Why not use the module as it's designed - this is far from what we provide in this module so we won't be able to troubleshoot

anzinchenko89 commented 16 hours ago

Yep, we're using the module not the way as it's designed, only the "user_data" submodule is needed. Can it be considered as the "wrong" way of using the module? Actually such kind of the module usage didn't cause any issues till the EKS upgrade to 1.30 and migrating to AL2023 (we did that in September, with the aws provider version 5.66.0). It's hardly likely that moving to the new AMI could have led to the "weird" terraform behavior, but "something" forces the terraform to think that content of user_data has been modified, but it hasn't.