terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.37k stars 4.03k forks source link

Bottlerocket - SelfManaged NodeGroup - extra parameter issue #3100

Closed adrianmiron closed 2 weeks ago

adrianmiron commented 1 month ago

Hi,

I am facing a weird issue when trying to add the following parameters to bottlerocket via bootstrap_extra_args .

This works fine :

      bootstrap_extra_args = <<-EOT
        [settings.host-containers.admin]
        enabled = true

        [settings.host-containers.control]
        enabled = true

        [settings.kubernetes.node-labels]
        "nodegroup" = "stable"
        "eks-cluster-name" = "tf-eks-devops-ontario"

        [settings.kubernetes.kube-reserved]
        cpu = "100m"
        memory = "300Mi"
        ephemeral-storage = "1Gi"

        [settings.kubernetes.system-reserved]
        cpu = "100m"
        ephemeral-storage = "1Gi"
        memory = "100Mi"

        [settings.kubernetes.eviction-hard]
        "memory.available" = "200Mi"
        "nodefs.available" = "5%"

      EOT

Adding the last argument block, as seen in the feature PR ( https://github.com/bottlerocket-os/bottlerocket/pull/2930 ) , causes kubelet to fail, the node does not join the cluster, SSM does not start so i can test the issue....

     bootstrap_extra_args = <<-EOT
        [settings.host-containers.admin]
        enabled = true

        [settings.host-containers.control]
        enabled = true

        [settings.kubernetes.node-labels]
        "nodegroup" = "stable"
        "eks-cluster-name" = "tf-eks-devops-ontario"

        [settings.kubernetes.kube-reserved]
        cpu = "100m"
        memory = "300Mi"
        ephemeral-storage = "1Gi"

        [settings.kubernetes.system-reserved]
        cpu = "100m"
        ephemeral-storage = "1Gi"
        memory = "100Mi"

        [settings.kubernetes.eviction-hard]
        "memory.available" = "200Mi"
        "nodefs.available" = "5%"

        [settings.kubernetes]
        "shutdown-grace-period" = "60s"

      EOT

If i start the node without that argument and do apiclient set settings.kubernetes.shutdown-grace-period=60s it acccepts the command .

This is only for selfmanaged nodegroups, Karpenter managed nodes with this setting work fine.

Anyone else seen this ? I have no clue what crazy magic is causing this ....

adrianmiron commented 1 month ago

Ok, i got it.... It was because of the LaunchTemplate which adds some default fields

[settings.kubernetes]
"cluster-name" = "****"
"api-server" = "*********eks.amazonaws.com"
"cluster-certificate" = "*******"
"cluster-dns-ip" = ["10.100.0.10"]

So adding another [settings.kubernetes] block lower in the template caused it to fail; this worked though :

     bootstrap_extra_args = <<-EOT
        "shutdown-grace-period" = "60s"
        "shutdown-grace-period-for-critical-pods" = "30s"

        [settings.host-containers.admin]
        enabled = true

        [settings.host-containers.control]
        enabled = true

        [settings.kubernetes.node-labels]
        "nodegroup" = "stable"
        "eks-cluster-name" = "tf-eks-devops-ontario"

        [settings.kubernetes.kube-reserved]
        cpu = "100m"
        memory = "300Mi"
        ephemeral-storage = "1Gi"

        [settings.kubernetes.system-reserved]
        cpu = "100m"
        ephemeral-storage = "1Gi"
        memory = "100Mi"

        [settings.kubernetes.eviction-hard]
        "memory.available" = "200Mi"
        "nodefs.available" = "5%"

      EOT
github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 2 weeks ago

This issue was automatically closed because of stale in 10 days