terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources πŸ‡ΊπŸ‡¦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.47k stars 4.08k forks source link

Unable to add Custom IAM role/policy using self-managed nodes example #2433

Closed jauyzed closed 1 year ago

jauyzed commented 1 year ago

Hello,

I'm using a modified version self managed worker nodes example shown below in main.tf, the terraform apply indeed says it will apply the iam role and policy defined below but somehow fails to apply because the IAM role is not created/attached and vpc-cni, coredns addons never gets applied.

All other resources comes up normally: EKS, worker_nodes etc. It takes 40 mins to create a cluster all together in our proxy enviroment.

Terraform plan/apply shows it will be created but the vpc-cni, coredns pods never come up successfully. It might help, if there is a way to set the depends_on for cluster addons. I don't know for sure if it could be a different issue.

$ terraform apply (snippet)

 # data.aws_iam_policy_document.iam_role_eks_oidc will be read during apply
  # (config refers to values not yet known)
 <= data "aws_iam_policy_document" "iam_role_eks_oidc" {
      + id   = (known after apply)
      + json = (known after apply)

      + statement {
          + actions = [
              + "sts:AssumeRoleWithWebIdentity",
            ]

          + condition {
              + test     = "StringEquals"
              + values   = [
                  + "sts.amazonaws.com",
                ]
              + variable = (known after apply)
            }
          + condition {
              + test     = "StringEquals"
              + values   = [
                  + "system:serviceaccount:kube-system:aws-node",
                ]
              + variable = (known after apply)
            }

          + principals {
              + identifiers = [
                  + (known after apply),
                ]
              + type        = "Federated"
            }
        }
    }

  # aws_iam_role.this will be created
  + resource "aws_iam_role" "this" {
      + arn                   = (known after apply)
      + assume_role_policy    = (known after apply)
      + create_date           = (known after apply)
      + force_detach_policies = false
      + id                    = (known after apply)
      + managed_policy_arns   = [
          + "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
        ]
      + max_session_duration  = 3600
      + name                  = "company1-svc-role-devops-testing-do29-terraform"
      + name_prefix           = (known after apply)
      + path                  = "/"
      + permissions_boundary  = "arn:aws:iam::<your-aws-account-id>:policy/company1-boundary-policy-eks-cni"
      + tags_all              = (known after apply)
      + unique_id             = (known after apply)

      + inline_policy {
          + name   = (known after apply)
          + policy = (known after apply)
        }
    }

module.eks.aws_eks_cluster.this[0]: Still creating... [15m50s elapsed]
module.eks.aws_eks_cluster.this[0]: Still creating... [16m0s elapsed]
module.eks.aws_eks_cluster.this[0]: Still creating... [16m10s elapsed]
module.eks.aws_eks_cluster.this[0]: Still creating... [16m20s elapsed]

Error: unexpected EKS Add-On (testing-do29-terraform:vpc-cni) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β”‚ [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β”‚
β”‚   with module.eks.aws_eks_addon.this["vpc-cni"],
β”‚   on .terraform/modules/eks/main.tf line 378, in resource "aws_eks_addon" "this":
β”‚  378: resource "aws_eks_addon" "this" {
β”‚
β•΅
β•·
β”‚ Error: unexpected EKS Add-On (testing-do29-terraform:coredns) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
β”‚ [WARNING] Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration
β”‚
β”‚   with module.eks.aws_eks_addon.this["coredns"],
β”‚   on .terraform/modules/eks/main.tf line 378, in resource "aws_eks_addon" "this":
β”‚  378: resource "aws_eks_addon" "this" {

I was able to provision the custom IAM role but soon after I set these, its failing.

create_aws_auth_configmap = true
manage_aws_auth_configmap = true

Here is the main.tf

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

data "aws_caller_identity" "current" {}

### FB Req Config ####
provider "tls" {
  proxy { 
      from_env = true 
  }
}
##################

locals {
  name            = "testing-do29-terraform"
  key_pair        = "company1-4-us-west-2-keypair"
  cluster_version = "1.23"
  region          = "us-west-1"

  k8s_service_account_name      = "aws-node"
  k8s_service_account_namespace = "kube-system"

  ## Get the EKS OIDC Issuer without https:// prefix
  eks_oidc_issuer = trimprefix(data.aws_eks_cluster.eks.identity[0].oidc[0].issuer, "https://")

  tags = {
    Name                    = local.name
    application             = "testing"
  }
}

################################################################################
# EKS Module
################################################################################

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "19.5.1"

  cluster_name                    = local.name
  cluster_version                 = local.cluster_version
  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = false

  create_iam_role = false
  create_kms_key = false
  cluster_encryption_config = {}

  iam_role_arn = "arn:aws:iam::<your-aws-acc-number>:role/company1-svc-role-eks-cluster"

  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
    }
  }

  vpc_id     = "vpc-050134ac9e18fc169"
  subnet_ids = ["subnet-00acf98792aacd89a","subnet-08769e3fbe2a294c9"]
  create_cluster_security_group = "false"
  cluster_security_group_id = "sg-035a9d9723f11107c"
  create_cloudwatch_log_group = true
  cloudwatch_log_group_kms_key_id = "arn:aws:kms:us-west-2:<your-aws-acc-number>:key/cbaa1a12-5d6c-458b-b93a-d1a8af1a1fd8"

  # Self managed node groups will not automatically create the aws-auth configmap so we need to
  create_aws_auth_configmap = true
  manage_aws_auth_configmap = true

  aws_auth_node_iam_role_arns_non_windows = [
    "arn:aws:iam::<your-aws-acc-number>:role/company1-svc-role-eks-worker"
  ]

  create_node_security_group = false
  node_security_group_id = "sg-04a5530b19bd1f6c5"

  self_managed_node_group_defaults = {

    # enable discovery of autoscaling groups by cluster-autoscaler
    autoscaling_group_tags = {
      "k8s.io/cluster-autoscaler/enabled" : true,
      "k8s.io/cluster-autoscaler/${local.name}" : "owned",
    }
  }

  self_managed_node_groups = {

    # Default node group - as provisioned by the module defaults
    ######## default_node_group = {}

    mixed = {
      name = "mixed"

      min_size     = 2
      max_size     = 5
      desired_size = 2

      create_iam_instance_profile = false
      iam_instance_profile_arn = "arn:aws:iam::<your-aws-acc-number>:instance-profile/company1-svc-role-eks-worker"
      create_security_group = false
      node_security_group_id = "sg-04a5530b19bd1f6c5"
      key_name = "company1-4267-us-west-2-keypair"

      bootstrap_extra_args = "--kubelet-extra-args '--node-labels=node.kubernetes.io/lifecycle=spot'"

      pre_bootstrap_user_data = <<-EOT
      echo "http_proxy=http://proxy.fb:8080" >>  /etc/environment
      echo "https_proxy=http://proxy.fb:8080" >> /etc/environment
      echo "HTTP_PROXY=http://proxy.fb:8080" >> /etc/environment
      echo "HTTPS_PROXY=http://proxy.fb:8080" >> /etc/environment
      echo "no_proxy=172.20.0.0/16,localhost,127.0.0.1,127.*,172.*,10.64.8.0/22,169.254.169.254,.internal,proxy.fb,.s3.us-west-2.amazonaws.com,ec2.us-west-2.amazonaws.com,api.ecr.us-west-2.amazonaws.com,dkr.ecr.us-west-2.amazonaws.com,eks.amazonaws.com,eks.us-west-2.amazonaws.com" >> /etc/environment
      echo "NO_PROXY=172.20.0.0/16,localhost,127.0.0.1,127.*,172.*,10.64.8.0/22,169.254.169.254,.internal,proxy.fb,.s3.us-west-2.amazonaws.com,ec2.us-west-2.amazonaws.com,api.ecr.us-west-2.amazonaws.com,dkr.ecr.us-west-2.amazonaws.com,eks.amazonaws.com,s3.amazonaws.com,eks.us-west-2.amazonaws.com" >> /etc/environment

      mkdir -p /etc/systemd/system/docker.service.d

      cat << EOF >> /etc/systemd/system/docker.service.d/proxy.conf
      [Service]
      EnvironmentFile=/etc/environment
      EOF

      cat << EOF >> /etc/systemd/system/kubelet.service.d/proxy.conf
      [Service]
      EnvironmentFile=/etc/environment
      EOF

      systemctl daemon-reload

      EOT

      ami_id = "<insert-ami-id-for-k8s-version-above>"

      ebs_optimized          = true
      vpc_security_group_ids = ["sg-0d36c4b3734d6ae63"]
      enable_monitoring      = true

      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs = {
            volume_size           = 75
            volume_type           = "gp3"
            iops                  = 3000
            throughput            = 150
            encrypted             = true
            kms_key_id            = "arn:aws:kms:us-west-2:<your-aws-acc-number>:key/cbaa1a12-5d6c-458b-b93a-d1a8af1a1fd8"
            delete_on_termination = true
          }
        }
      }

      use_mixed_instances_policy = false
      mixed_instances_policy = {
        instances_distribution = {
          on_demand_base_capacity                  = 0
          on_demand_percentage_above_base_capacity = 20
          spot_allocation_strategy                 = "capacity-optimized"
        }

        override = [
          {
            instance_type     = "m5.large"
            weighted_capacity = "1"
          },
          {
            instance_type     = "m6i.large"
            weighted_capacity = "2"
          },
        ]
      }
    }

  }

  tags = local.tags
}

################################################################################
# Supporting Resources
################################################################################

data "aws_eks_cluster" "eks" {
  name = module.eks.cluster_name
  depends_on = [module.eks]

}

data "aws_iam_policy" "fb-boundry-policy-eks-cni" {
  arn = "arn:aws:iam::<your-aws-acc-number>:policy/company1-boundary-policy-eks-cni"
}

data "aws_iam_policy" "AmazonEKS_CNI_Policy" {
  arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}

resource "aws_iam_role" "this" {

  name = "company1-svc-role-devops-${local.name}"
  depends_on = [module.eks.aws_iam_openid_connect_provider]

  managed_policy_arns = ["arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"]
  permissions_boundary = "arn:aws:iam::<your-aws-acc-number>:policy/company1-boundary-policy-eks-cni"
  assume_role_policy = data.aws_iam_policy_document.iam_role_eks_oidc.json

}

#### Create IAM policy allowing the k8s service account to assume the IAM role

data "aws_iam_policy_document" "iam_role_eks_oidc" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]

    principals {
      type = "Federated"
      identifiers = [
        "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.eks_oidc_issuer}"
      ]
    }

    # Limit the scope so that only our desired service account can assume this role
    condition {
      test     = "StringEquals"
      variable = "${local.eks_oidc_issuer}:sub"
      values = [
        "system:serviceaccount:${local.k8s_service_account_namespace}:${local.k8s_service_account_name}"
      ]
    }

    condition {
      test     = "StringEquals"
      variable = "${local.eks_oidc_issuer}:aud"
      values = [
        "sts.amazonaws.com"
      ]
    }
  }
}

#### Create proxy env variables required for pods to communicate to external endpoints ###

resource "kubernetes_config_map_v1" "proxy-config-map" {
  metadata {
  name = "proxy-environment-variables"
   namespace = "kube-system"
  }

  data = {
   "HTTP_PROXY" = "http://proxy.fb:8080"
   "HTTPS_PROXY" = "http://proxy.fb:8080"
   "NO_PROXY"    = "172.20.0.0/16,localhost,127.0.0.1,127.*,172.*,10.64.8.0/22,169.254.169.254,.internal,proxy.fb,.s3.us-west-2.amazonaws.com,ec2.us-west-2.amazonaws.com,api.ecr.us-west-2.amazonaws.com,dkr.ecr.us-west-2.amazonaws.com,eks.amazonaws.com,s3.amazonaws.com,eks.us-west-2.amazonaws.com"
  }
}

##### Apply the annotations for aws-node in kube-system namespace ###

resource "kubernetes_annotations" "aws-node" {
  api_version = "v1"
  kind        = "ServiceAccount"
  metadata {
    name = "aws-node"
    namespace = "kube-system"
  }
  annotations = {
    "eks.amazonaws.com/role-arn"="arn:aws:iam::<your-aws-acc-number>:role/company1-svc-role-devops-${local.name}"
  }
}

version.tf

terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 4.47"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 3.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.16.1"
    }
  }
}

Thanks!

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 1 year ago

This issue was automatically closed because of stale in 10 days

jauyzed commented 1 year ago

@bryantbiggs Was there a solution for this issue as it was closed? I'm on the latest version

bryantbiggs commented 1 year ago

it was closed automatically due to inactivity. Please see our examples provided

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.