terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.44k stars 4.06k forks source link

Ephemeral storage/node storage are not following configurations in module #2005

Closed auyer closed 2 years ago

auyer commented 2 years ago

Description

Ephemeral storage/node storage are not following configurations in module.

Versions

Reproduction Code [Required]

module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  version         = "18.20.1"
  cluster_name    = local.cluster_name
  cluster_version = var.k8s_version
  subnet_ids      = var.vpc_private_subnets
  vpc_id          = var.vpc_vpc_id
  enable_irsa     = true

  cluster_addons = {
    coredns = {
      resolve_conflicts = "OVERWRITE"
    }
    kube-proxy = {}
    vpc-cni = {
      resolve_conflicts        = "OVERWRITE"
      service_account_role_arn = module.vpc_cni_irsa.iam_role_arn
    }
  }
  # Extend cluster security group rules
  cluster_security_group_additional_rules = {
   ...
  }
  node_security_group_name = "${local.cluster_name}-node-sg"
  eks_managed_node_group_defaults = {
    ami_type = "BOTTLEROCKET_x86_64"
    platform = "bottlerocket"
    disk_size                             = 128
    instance_types                        = local.instance_type
    iam_role_additional_policies          = var.iam_role_additional_policies
    vpc_security_group_ids                = var.additional_security_group_ids
    min_size                              = local.min_size
    max_size                              = local.max_size
    desired_size                          = local.desired_size
    iam_role_attach_cni_policy            = true
    attach_cluster_primary_security_group = true

    tags = merge(var.default_tags, {
      "k8s.io/cluster-autoscaler/enabled"               = "true"
      "k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
    })
  }
  eks_managed_node_groups = {
    spot-g = {
      # use_name_prefix = true
      min_size       = 0
      max_size       = 100 //local.max_size * 20
      desired_size   = 1   //ceil(local.desired_size / 4)
      disk_size      = 128
      instance_types = local.spot_instance_type[local.env]
      capacity_type  = "SPOT"

      bootstrap_extra_args = <<-EOT
      # extra args added
      # [settings.kernel]
      # lockdown = "integrity"

      [settings.kubernetes.node-labels]
      "node.kubernetes.io/lifecycle" = "spot"
      EOT
    }

    ondamand = {
      # use_name_prefix = true
      min_size       = 1 //2
      max_size       = 6 //local.max_size
      desired_size   = 1
      disk_size      = 64
      instance_types = local.instance_type[local.env]
      capacity_type  = "ON_DEMAND"

      bootstrap_extra_args = <<-EOT
      # extra args added
      # [settings.kernel]
      # lockdown = "integrity"

      [settings.kubernetes.node-labels]
      "node.kubernetes.io/lifecycle" = "normal"
      EOT
    }
  }

  tags = merge(var.default_tags,
    {
      Name = local.cluster_name
    }
  )
}

Steps to reproduce the behavior:

Expected behavior

disk_size is being set in the eks_managed_node_group_defaults block and in every node_group created. Nodes should have the specified disk.

Actual behavior:

Storage does not follow any of the configured values.

$ kubectl describe nodes 
> 
...
Capacity:
  attachable-volumes-aws-ebs:  39
  cpu:                         16
  ephemeral-storage:           20624592Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      64453472Ki
  pods:                        58
Allocatable:
  attachable-volumes-aws-ebs:  39
  cpu:                         15890m
  ephemeral-storage:           17933882132
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      63436640Ki
  pods:                        58
  ...
  Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests       Limits
  --------                    --------       ------
  cpu                         2125m (13%)    0 (0%)
  memory                      17403Mi (28%)  17715Mi (28%)
  ephemeral-storage           6Gi (35%)      10Gi (59%)
  hugepages-1Gi               0 (0%)         0 (0%)
  hugepages-2Mi               0 (0%)         0 (0%)
  attachable-volumes-aws-ebs  0              0

Additional context

This is using the bottlerocket AMI. My acctual structure is abstracting this repo a bit, and the whole eks_managed_node_groups dict comes from a variable. To reproduce this, create a variable and eks_managed_node_groups = var.node_groups. But his should not affect the issue.

bryantbiggs commented 2 years ago

Setting the volume directly on the managed node group only works if using the default launch template created by the EKS managed node group service. You can see here regarding how to enable the use of the default template

However, that means you will lose access to set other configurations such as bootstrap_extra_args which require the use of a custom launch template (module defaults to using custom launch template for this reason, hence why the volume setting has no effect).

To change volume settings you will have to use the block_device_mapping variable https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_managed_node_group/main.tf#L277-L289

auyer commented 2 years ago

I see. I only use the bootstrap_extra_args for setting node-labels like this:

      [settings.kubernetes.node-labels]
      "node.kubernetes.io/lifecycle" = "spot"

Is there another way to do this so I dont need to use bootstrap_extra_args ?

bryantbiggs commented 2 years ago

the EKS managed node group service does this automatically when using SPOT

auyer commented 2 years ago

In my tests, vpc_security_group_ids also stopped working. Am I correct to assume that it also needs create_launch_template to be true?

bryantbiggs commented 2 years ago

@auyer could you post a full reproduction of your code - its difficult to tell what you have and what you are trying to achieve

amjanoni commented 2 years ago

@auyer the _diskspace variable only takes effect when _create_launchtemplate is false, and the default for it is true.

I added these blocks in the eks_managed_node_group_defaults to handle the root disk size, some AMIs change the root device_name but you can check the pattern in the AWS doc here.

  eks_managed_node_group_defaults = {
    block_device_mappings = {
      root = {
        device_name = "/dev/xvda"
        ebs = {
          volume_size = 64
        }
      }
    }
    instance_types = ["t3.xlarge", "t3a.xlarge", "m6i.xlarge", "m6a.xlarge"]
  }
bryantbiggs commented 2 years ago

closing this issue out for now - please refer to the examples/ as they show a wide array of usage configurations as well as the documentation which provides a wealth of resources

auyer commented 2 years ago

Hi. I am still having issues with this. I'm using the same setup in the original question, with create_launch_template=true. I've added the block_device_mappings but my issues persist.

This is the default values for my nodes:

  eks_managed_node_group_defaults = {
    create_launch_template                = true
    ami_type                              = "BOTTLEROCKET_x86_64"
    platform                              = "bottlerocket"
    disk_size                             = 128
    instance_types                        = local.instance_type
    iam_role_additional_policies          = var.iam_role_additional_policies
    vpc_security_group_ids                = var.additional_security_group_ids
    create_security_group                 = true
    min_size                              = local.min_size
    max_size                              = local.max_size
    desired_size                          = local.desired_size
    iam_role_attach_cni_policy            = true
    attach_cluster_primary_security_group = true
    block_device_mappings = {
      xvda = {
        device_name = "/dev/xvda"
        ebs = {
          volume_size           = 30
          volume_type           = "gp3"
          iops                  = 3000
          throughput            = 150
          delete_on_termination = true
        }
      }
    }

    tags = merge(var.default_tags, {
      "k8s.io/cluster-autoscaler/enabled"               = "true"
      "k8s.io/cluster-autoscaler/${local.cluster_name}" = "owned"
    })
  }

Check out one of the nodes kubectl describe nodes ~>

Name:               [redacted].ec2.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=r5.2xlarge
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/capacityType=SPOT
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-[redacted].ec2.internal
                    kubernetes.io/os=linux
                    ...
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-04dbe8393b8e2ecf6"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 04 May 2022 11:36:57 -0300
Taints:             node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ip-[redacted].ec2.internal
  AcquireTime:     <unset>
  RenewTime:       Wed, 04 May 2022 12:05:42 -0300
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 04 May 2022 12:05:18 -0300   Wed, 04 May 2022 11:36:56 -0300   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     True    Wed, 04 May 2022 12:05:18 -0300   Wed, 04 May 2022 12:05:18 -0300   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure      False   Wed, 04 May 2022 12:05:18 -0300   Wed, 04 May 2022 11:36:56 -0300   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 04 May 2022 12:05:18 -0300   Wed, 04 May 2022 11:37:27 -0300   KubeletReady                 kubelet is posting ready status
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         8
  ephemeral-storage:           20624592Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      65030940Ki
  pods:                        58
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         7910m
  ephemeral-storage:           17933882132
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      64014108Ki
  pods:                        58
System Info:
  Machine ID:                 ec246d81b0f4e2cf2ed28ac91ab3f515
  System UUID:                ec246d81-b0f4-e2cf-2ed2-8ac91ab3f515
  Boot ID:                    0d655ff9-5720-417f-a3fb-279b6a73ed68
  Kernel Version:             5.10.102
  OS Image:                   Bottlerocket OS 1.7.0 (aws-k8s-1.22)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.5.11+bottlerocket
  Kubelet Version:            v1.22.6-eks-b18cdc9
  Kube-Proxy Version:         v1.22.6-eks-b18cdc9
ProviderID:                   aws:///us-east-1a/i-0[redacted]
Non-terminated Pods:          (6 in total)
  Namespace                   Name                                         CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                         ------------  ----------  ---------------  -------------  ---
  kube-system                 aws-node-9z2vz                               25m (0%)      0 (0%)      0 (0%)           0 (0%)         28m
  kube-system                 aws-node-termination-handler-ksklw           0 (0%)        0 (0%)      0 (0%)           0 (0%)         28m
  kube-system                 ebs-csi-node-ltclq                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         28m
  kube-system                 kube-proxy-5qbnq                             100m (1%)     0 (0%)      0 (0%)           0 (0%)         28m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests   Limits
  --------                    --------   ------
  cpu                         125m (1%)  0 (0%)
  memory                      0 (0%)     0 (0%)
  ephemeral-storage           0 (0%)     0 (0%)
  hugepages-1Gi               0 (0%)     0 (0%)
  hugepages-2Mi               0 (0%)     0 (0%)
  attachable-volumes-aws-ebs  0          0
Events:
  Type     Reason                   Age                  From        Message
  ----     ------                   ----                 ----        -------
  Normal   Starting                 28m                  kubelet     Starting kubelet.
  Warning  InvalidDiskCapacity      28m                  kubelet     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  28m (x2 over 28m)    kubelet     Node ip-[redacted].ec2.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID     28m (x2 over 28m)    kubelet     Node ip-[redacted].ec2.internal status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  28m                  kubelet     Updated Node Allocatable limit across pods
  Normal   Starting                 28m                  kube-proxy  Starting kube-proxy.
  Normal   NodeReady                28m                  kubelet     Node ip-[redacted].ec2.internal status is now: NodeReady
  Normal   NodeHasNoDiskPressure    6m49s (x3 over 28m)  kubelet     Node ip-[redacted].ec2.internal status is now: NodeHasNoDiskPressure
  Warning  EvictionThresholdMet     37s (x2 over 12m)    kubelet     Attempting to reclaim ephemeral-storage
  Normal   NodeHasDiskPressure      28s (x2 over 11m)    kubelet     Node ip-[redacted].ec2.internal status is now: NodeHasDiskPressure
bryantbiggs commented 2 years ago

You're setting a volume size of 30gb and your output is showing attachable-volumes-aws-ebs: 25 - what is the issue?

auyer commented 2 years ago

The issue I refer is DiskPressure. The metrics endpoint tells me I have high fsUsage (and I'm guessing its related). Screenshot from k8s lens app: image

Maybe I got it wrong. But earlier today I god lots of evicted pods due to high DiskPressure, and this got me worried. I added a second disk /dev/xvdb as a test.

      xvdb = {
        device_name = "/dev/xvdb"
        ebs = {
          volume_size           = 60
          volume_type           = "gp3"
          iops                  = 3000
          throughput            = 150
          delete_on_termination = true
        }
      }

If I get the same issue again, I'll post it here.

mweberjc commented 1 year ago

Can we reopen this issue? For bottlerocket MNGs, the data volume is /dev/xvdb, and so the following BDM gets applied to the OS partition (which is read-only, and does not need to be large): https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/modules/aws-eks-managed-node-groups/locals.tf#L60

In particular, this leaves the data volume unencrypted by default, which is probably not intended. It's also small, depending on the workloads.

Changing the above device name to /dev/xvdb gives the expected result: image

$ kubectl describe nodes ...
[...]
Capacity:
  ephemeral-storage:           82547144Ki
github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.