terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources πŸ‡ΊπŸ‡¦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.46k stars 4.08k forks source link

Cannot set --max-pods in the eks configuration #2551

Closed insider89 closed 1 year ago

insider89 commented 1 year ago

Description

Cannot override max-pods with latest 19.12 module. I've cluster provision with m2.large instance, which set 17 pods per node by default. I've set ENABLE_PREFIX_DELEGATION = "true" and WARM_PREFIX_TARGET = "1" for vpc-cni addons, but it doesn't help, still have 17 pods per node. In the Launch templates I see following:

/etc/eks/bootstrap.sh dev --kubelet-extra-args '--node-labels=node_group=infra,eks.amazonaws.com/nodegroup-image=ami-04dc8cdc2e948f054,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=infra-20230316203627944100000001 --register-with-taints=infra=true:NoSchedule --max-pods=17' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP --use-max-pods false

I tried to provide the following part to my managed group configuration, but module just ignore it:

      enable_bootstrap_user_data = true
      bootstrap_extra_args       = "--kubelet-extra-args '--max-pods=50'"

      pre_bootstrap_user_data = <<-EOT
        export USE_MAX_PODS=false
      EOT

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

Reproduction Code [Required]

# https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2009
data "aws_eks_cluster" "default" {
  name = local.name
  depends_on = [
    module.eks.eks_managed_node_groups,
  ]
}

data "aws_eks_cluster_auth" "default" {
  name = local.name
  depends_on = [
    module.eks.eks_managed_node_groups,
  ]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

data "aws_ami" "eks_default" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amazon-eks-node-${local.cluster_version}-v*"]
  }
}

data "aws_iam_roles" "sso_admins" {
  name_regex  = "AWSReservedSSO_AdministratorAccess_.*"
  path_prefix = "/aws-reserved/sso.amazonaws.com/eu-west-1/"
}

data "aws_iam_roles" "sso_developers" {
  name_regex  = "AWSReservedSSO_DeveloperAccess_.*"
  path_prefix = "/aws-reserved/sso.amazonaws.com/eu-west-1/"
}

locals {
  name            = "dev"
  cluster_version = "1.25"
  region          = "eu-west-1"

  vpc_cidr = data.terraform_remote_state.vpc.outputs.vpc_cidr_block
  azs      = slice(data.aws_availability_zones.available.names, 0, 3)

  tags = {
    Environment = "dev"
    Team        = "DevOps"
    Terraform   = "true"
  }
}

data "aws_availability_zones" "available" {}
data "aws_caller_identity" "current" {}

################################################################################
# EKS Module
################################################################################

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "19.12"

  cluster_name                   = local.name
  cluster_version                = local.cluster_version
  cluster_endpoint_public_access = false

  cluster_addons = {
    coredns = {
      addon_version = "v1.9.3-eksbuild.2"

      timeouts = {
        create = "25m"
        delete = "10m"
      }
    }
    kube-proxy = {
      addon_version = "v1.25.6-eksbuild.2"
    }
    vpc-cni = {
      addon_version  = "v1.12.6-eksbuild.1"
      before_compute = true
      configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
    aws-ebs-csi-driver = {
      addon_version            = "v1.17.0-eksbuild.1"
      service_account_role_arn = module.ebs_csi_irsa_role.iam_role_arn
    }
  }

  vpc_id                   = data.terraform_remote_state.vpc.outputs.vpc_id
  subnet_ids               = data.terraform_remote_state.vpc.outputs.private_subnets
  control_plane_subnet_ids = data.terraform_remote_state.vpc.outputs.intra_subnets

  # https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2009#issuecomment-1262099428
  cluster_security_group_additional_rules = {
    ingress = {
      description                = "EKS Cluster allows 443 port to get API call"
      type                       = "ingress"
      from_port                  = 443
      to_port                    = 443
      protocol                   = "TCP"
      cidr_blocks                = ["10.1.0.0/16"]
      source_node_security_group = false
    }
  }

  node_security_group_additional_rules = {
    node_to_node = {
      from_port = 0
      to_port   = 0
      protocol  = -1
      self      = true
      type      = "ingress"
    }
  }

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    attach_cluster_primary_security_group = true

    ami_type = "AL2_x86_64"

    instance_types = [
      "m5.large",
      "m5.xlarge",
      "m4.large",
      "m4.xlarge",
      "c3.large",
      "c3.xlarge",
      "t2.large",
      "t2.medium",
      "t2.xlarge",
      "t3.medium",
      "t3.large",
      "t3.xlarge"
    ]
    iam_role_additional_policies = {
      AmazonEC2ContainerRegistryReadOnly = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
    }
  }

  eks_managed_node_groups = {
    default = {
      description = "Default EKS managed node group"

      use_custom_launch_template = false

      remote_access = {
        ec2_ssh_key = data.terraform_remote_state.ssh_key.outputs.aws_key_pair_id
      }

      ami_id                     = data.aws_ami.eks_default.image_id
      enable_bootstrap_user_data = true
      bootstrap_extra_args       = "--kubelet-extra-args '--max-pods=50'"

      pre_bootstrap_user_data = <<-EOT
        export USE_MAX_PODS=false
      EOT

      min_size     = 1
      max_size     = 10
      desired_size = 1
      disk_size    = 20

      update_config = {
        max_unavailable_percentage = 33 # or set `max_unavailable`
      }

      labels = {
        node_group = "default"
      }
    }

    infra = {
      description                = "EKS managed node group for infra workloads"
      use_custom_launch_template = false

      remote_access = {
        ec2_ssh_key = data.terraform_remote_state.ssh_key.outputs.aws_key_pair_id
      }

      min_size     = 1
      max_size     = 10
      desired_size = 1
      disk_size    = 20

      update_config = {
        max_unavailable_percentage = 33 # or set `max_unavailable`
      }

      labels = {
        node_group = "infra"
      }

      taints = {
        dedicated = {
          key    = "infra"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      }
    }
  }

  # aws-auth configmap
  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_admins.names)}"
      username = "sso-admin:{{SessionName}}"
      groups   = ["system:masters"]
    },
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_developers.names)}"
      username = "sso-developer:{{SessionName}}"
      groups   = ["system:masters"]
    },
  ]

  tags = local.tags
}

Expected behavior

Have 50 pods per node

Actual behavior

Have 17 pod per node

Additional context

I am going through different issue, but didn't find how to change the max-pod. This suggestion doesn't work.

insider89 commented 1 year ago

When I delete a few instances from instance type and left instance with a bigger number of pods, I have limits of 29 pods per node now. But still cannot reach goal with 110 pods. When I only left m2.large instance type, I have 110 pods per node, but it's not cause bootstrap_extra_args or any other configuration, it's set automatically, don't know why.

So the question still actually, how to set max-pod to 110.

Pionerd commented 1 year ago

It looks like I have the same issue over here. Any updates from your side @insider89 ?

insider89 commented 1 year ago

@Pionerd I didn't find a way how to set --max-pod in the eks terraform module. I figure out that if I provide different instance type in the instance_types, it set --max-pod to lowest number from the instance_types. So, first of all I left instance type with the same amount of CPU and Memory in the instance group(as cluster autoscaler cannot scale different instance type), and remove the instance type with lowest max pods by this list.

Pionerd commented 1 year ago

I hate to say this, but I recreated my environment from scratch and now my max_pods are 110... I suspect it has to do with configuring the VPC CNI before creation of the node pools.

The following is sufficient, no need for bootstrap_extra_args

  cluster_addons = {
    vpc-cni = {
      most_recent = true  
      before_compute           = true
      configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
  }
insider89 commented 1 year ago

@Pionerd I've this flag enabled as well for cni plugin, but still have max pod per node depends from the instance type I provide in the instance_types variable, in my case it's 20 pods per node(cause I've m4.large in the instance type), here is mu full configuration:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "19.13"

  cluster_name                   = local.name
  cluster_version                = local.cluster_version
  cluster_endpoint_public_access = false

  cluster_addons = {
    coredns = {
      addon_version = "v1.9.3-eksbuild.2"

      timeouts = {
        create = "25m"
        delete = "10m"
      }
    }
    kube-proxy = {
      addon_version = "v1.26.2-eksbuild.1"
    }
    vpc-cni = {
      addon_version  = "v1.12.6-eksbuild.1"
      before_compute = true
      configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }
    aws-ebs-csi-driver = {
      addon_version            = "v1.17.0-eksbuild.1"
      service_account_role_arn = module.ebs_csi_irsa_role.iam_role_arn
    }
  }

  vpc_id                   = data.terraform_remote_state.vpc.outputs.vpc_id
  subnet_ids               = data.terraform_remote_state.vpc.outputs.private_subnets
  control_plane_subnet_ids = data.terraform_remote_state.vpc.outputs.intra_subnets

  # https://github.com/terraform-aws-modules/terraform-aws-eks/issues/2009#issuecomment-1262099428
  cluster_security_group_additional_rules = {
    ingress = {
      description                = "EKS Cluster allows 443 port to get API call"
      type                       = "ingress"
      from_port                  = 443
      to_port                    = 443
      protocol                   = "TCP"
      cidr_blocks                = ["10.1.0.0/16"]
      source_node_security_group = false
    }
  }

  node_security_group_additional_rules = {
    node_to_node = {
      from_port = 0
      to_port   = 0
      protocol  = -1
      self      = true
      type      = "ingress"
    }
  }

  # EKS Managed Node Group(s)
  eks_managed_node_group_defaults = {
    attach_cluster_primary_security_group = true

    iam_role_additional_policies = {
      AmazonEC2ContainerRegistryReadOnly = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
    }
  }

  eks_managed_node_groups = {
    default = {
      description = "Default EKS managed node group"

      use_custom_launch_template = false

      remote_access = {
        ec2_ssh_key = data.terraform_remote_state.ssh_key.outputs.aws_key_pair_id
      }

      instance_types = [
        "m5.large",
        "t2.large",
        "t3.large",
        "m5d.large",
        "m5a.large",
        "m5ad.large",
        "m5n.large",
        "m5dn.large",
        "m4.large",
      ]

      min_size     = 1
      max_size     = 15
      desired_size = 1
      disk_size    = 20

      update_config = {
        max_unavailable_percentage = 33 # or set `max_unavailable`
      }

      labels = {
        node_group = "default"
      }
    }

    infra = {
      description                = "EKS managed node group for infra workloads"
      use_custom_launch_template = false

      instance_types = [
        "m5.large",
        "t2.large",
        "t3.large",
        "m5d.large",
        "m5a.large",
        "m5ad.large",
        "m5n.large",
        "m5dn.large",
        "m4.large"
      ]

      remote_access = {
        ec2_ssh_key = data.terraform_remote_state.ssh_key.outputs.aws_key_pair_id
      }

      min_size     = 1
      max_size     = 15
      desired_size = 1
      disk_size    = 20

      update_config = {
        max_unavailable_percentage = 33 # or set `max_unavailable`
      }

      labels = {
        node_group = "infra"
      }

      taints = {
        dedicated = {
          key    = "infra"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      }
    }
  }

  # aws-auth configmap
  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_admins.names)}"
      username = "sso-admin:{{SessionName}}"
      groups   = ["system:masters"]
    },
    {
      rolearn  = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/${one(data.aws_iam_roles.sso_developers.names)}"
      username = "sso-developer:{{SessionName}}"
      groups   = ["system:masters"]
    },
  ]
}
Pionerd commented 1 year ago

Hi @insider89

Just ran into the issue again with exactly the same code as before. Looks like some kind of timing issue still. What worked for me (this time, no guarantees) is leaving the cluster intact, removing the existing node group only and recreating it.

SrDayne commented 1 year ago

Hello guys.

For me it looks like problem not in terraform itself but in aws. Looks like amazon's bootstrap overrides provided values. I use next workaround:

eks_managed_node_groups = {
    dev = {
      name = "k8s-dev"

      instance_types = ["t3.medium"]

      enable_bootstrap_user_data = false

      pre_bootstrap_user_data = <<-EOT
        #!/bin/bash
        LINE_NUMBER=$(grep -n "KUBELET_EXTRA_ARGS=\$2" /etc/eks/bootstrap.sh | cut -f1 -d:)
        REPLACEMENT="\ \ \ \ \ \ KUBELET_EXTRA_ARGS=\$(echo \$2 | sed -s -E 's/--max-pods=[0-9]+/--max-pods=30/g')"
        sed -i '/KUBELET_EXTRA_ARGS=\$2/d' /etc/eks/bootstrap.sh
        sed -i "$${LINE_NUMBER}i $${REPLACEMENT}" /etc/eks/bootstrap.sh
      EOT

      min_size = 1
      max_size = 3
      desired_size = 2

      #taints = [
      #  {
      #    key = "node.cilium.io/agent-not-ready"
      #    value = "true"
      #    effect = "NoExecute"
      #  }
      #]
    }
  }

It is not elegant solution, but it works. It replaces on the fly line in bootstrap script which responsible for --kubelet-extra-args. Notice, that if you use custom ami_id setup could be a little bit different, but still, it should work.

As a result: kubectl describe node ip-10-1-0-102.eu-south-1.compute.internal

Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           20959212Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      3943372Ki
  pods:                        30
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1930m
  ephemeral-storage:           18242267924
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      3388364Ki
  pods:                        30

Edit Tried node autoscaling, tried multiple times recreate environments - script works.

bryantbiggs commented 1 year ago
  1. Yes - managed nodegroups own the boostrap script in the user data which leads to hacky work-arounds https://github.com/awslabs/amazon-eks-ami/issues/844
  2. The proper way to enable max pods is by setting the intended values via the VPC CNI custom configuration. If the VPC CNI is configured before nodegroups are created and nodes launched, EKS managed nodegroups will infer from the VPC CNI configuration the proper value for max pods. There is a flag that should be enabled to ensure the VPC CNI can be created before the associated nodegroups https://github.com/terraform-aws-modules/terraform-aws-eks/blob/0f9d9fac93caf239386f47b0f117706ea78c2bec/examples/eks_managed_node_group/main.tf#L66 which has a default timeout of 30s that can be increased if necessary https://github.com/terraform-aws-modules/terraform-aws-eks/blob/0f9d9fac93caf239386f47b0f117706ea78c2bec/node_groups.tf#L22-L39

For now though, closing out since there are no further actions (that I am aware of) that the module can take to improve upon this area

CostinaDamir commented 1 year ago

pre_bootstrap_user_data = <<-EOT

!/bin/bash

    LINE_NUMBER=$(grep -n "KUBELET_EXTRA_ARGS=\$2" /etc/eks/bootstrap.sh | cut -f1 -d:)
    REPLACEMENT="\ \ \ \ \ \ KUBELET_EXTRA_ARGS=\$(echo \$2 | sed -s -E 's/--max-pods=[0-9]+/--max-pods=30/g')"
    sed -i '/KUBELET_EXTRA_ARGS=\$2/d' /etc/eks/bootstrap.sh
    sed -i "$${LINE_NUMBER}i $${REPLACEMENT}" /etc/eks/bootstrap.sh
  EOT

I tried your workarround, but I get tf error: β”‚ Error: Variables not allowed β”‚ β”‚ on <value for var.eks_managed_node_groups> line 1: β”‚ (source code not available)

Any idea?

ophintor commented 1 year ago

I think you need to escape all the $?

SrDayne commented 1 year ago

@CostinaDamir As @ophintor said, you need to escape multiple $. Copy piece of code without any changes:

      pre_bootstrap_user_data = <<-EOT
        #!/bin/bash
        LINE_NUMBER=$(grep -n "KUBELET_EXTRA_ARGS=\$2" /etc/eks/bootstrap.sh | cut -f1 -d:)
        REPLACEMENT="\ \ \ \ \ \ KUBELET_EXTRA_ARGS=\$(echo \$2 | sed -s -E 's/--max-pods=[0-9]+/--max-pods=30/g')"
        sed -i '/KUBELET_EXTRA_ARGS=\$2/d' /etc/eks/bootstrap.sh
        sed -i "$${LINE_NUMBER}i $${REPLACEMENT}" /etc/eks/bootstrap.sh
      EOT

Also, you can replace --max-pods=30 with --max-pods=${var.cluster_max_pods} and set amount of pods with variable.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.