terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.39k stars 4.04k forks source link

Add CUSTOM ami_type support. #3094

Open Ramyak opened 2 months ago

Ramyak commented 2 months ago

Is your request related to a new offering from AWS?

No.

Is your request related to a problem? Please describe.

https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html#AmazonEKS-Type-Nodegroup-amiType CUSTOM ami_type.

  1. Can the module support CUSTOM AMI type? https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v20.17.2/modules/self-managed-node-group/main.tf#L13-L27

  2. For CUSTOM AMI type (self managed node group), can we not assume that the AMI will understand content_type = "application/node.eks.aws"

Describe the solution you'd like.

Describe alternatives you've considered.

Additional context

bryantbiggs commented 2 months ago

can you provide a *simplified example configuration of what you are trying to achieve?

jpriebe commented 1 month ago

I think I understand what the problem is here.

If you build a custom AMI from the amazon linux eks AMIs (e.g. images matching this pattern"amazon-eks-node-al2023-x86_64-standard-EKS_VERSION-*"), and you specify ami_type = "CUSTOM", the module will use the old Amazon Linux 2 style userdata (takes your bootstrap user data and appends a call to /etc/eks/bootstrap.sh).

We need to be able to use a custom AMI but use the AL2023 style userdata with the bootstrap script and the NodeConfig YAML in a multipart MIME document.

At least that's my problem. Not sure exactly what @Ramyak is describing (they mention the application/node.eks.aws MIME type, which is the MIME type of the NodeConfig YAML document, so I figured it was related. But then they say they want to launch a custom AMI type that doesn't have nodeadm installed, which is confusing, since nodeadm is the way Amazon Linux 2023 initializes nodes.

The module has supported custom AMIs based on Amazon Linux 2 for as long as I've used it. I'm finding problems with AL2023.

I'm going to try a workaround of setting ami_type = "CUSTOM" and platform = "al2023" (I know platform is deprecated, but I need a workaround now)

jpriebe commented 1 month ago

Well, that workaround failed. I got the NodeConfig in the userdata, but I didn't get my bootstrap script.

Here are the relevant parts of my node group configuration:

  eks_managed_node_groups = {
    ng = {
      enable_bootstrap_user_data = true
      pre_bootstrap_user_data = templatefile("${path.module}/eks-node-userdata.sh.tpl", {...})

      ami_type = "CUSTOM"

      platform = "al2023"

and here is the resulting /var/lib/cloud/instance/user-data.txt

Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0

--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: application/node.eks.aws
Mime-Version: 1.0

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: <<REDACTED>>
    apiServerEndpoint: https://<<REDACTED>>.gr7.us-east-1.eks.amazonaws.com
    certificateAuthority: <<REDACTED>>
    cidr: 172.20.0.0/16

--MIMEBOUNDARY--

don't know where my bootstrap userdata went.

jpriebe commented 1 month ago

Maybe the issue is that the al2023_user_data.tpl doesn't have the bootstrap user data token (${pre_bootstrap_user_data ~}) ?

or maybe I don't understand how the userdata is assembled. Maybe when I'm generating al2023 userdata, I need to put my script into cloudinit_pre_nodeadm. I'll try that on Monday.

jpriebe commented 1 month ago

I tried putting my code into cloudinit_pre_nodeadm, and that worked. So here is everything I had to specify in order to use custom AMI that is derived from al2023:

  eks_managed_node_groups = {
    ng = {
      cloudinit_pre_nodeadm = [
        {
          content_type = "text/x-shellscript; charset=\"us-ascii\""
          content = templatefile("${path.module}/userdata.sh.tpl", { 
               ...  template variables redacted... 
          })
        }
      ]

      ami_type = "CUSTOM"
      platform = "al2023"
bryantbiggs commented 1 month ago

again, its hard to follow without a clear, minimal, and simple reproduction

does this not cover what you are trying to do?

    ami_type                   = "CUSTOM"
    enable_bootstrap_user_data = true
    user_data_template_path    = "${path.module}/userdata.sh.tpl"

Note: your user data looks like its in shell format, but AL2023 does not use shell format. Can't really provide more guidance without a proper reproduction

jpriebe commented 1 month ago

The userdata.sh.tpl you see referenced in my code example is just a bash script. But it's intended to be added to the multipart MIME document that the module creates.

I would like to use the al2023_user_data.tpl template that is built into the module, rather than having to craft the multipart MIME myself. I just want to pass in a simple script in the cloudinit_pre_nodeadm parameter and have the module build the multipart mime with my pre-nodeadm portion plus the NodeConfig YAML that has a number of references to cluster properties.

But you can't do that without using the var.platform workaround.

If ami_type is CUSTOM, the logic here will fall through to var.platform to set user_data_type; this value defaults to linux. This means it will use the linux_user_data.tpl template, which is not compatible with AL2023.

I realize I'm abusing var.platform by using it in a new deployment. This is not a good workaround. But it works.

I'll see if I can find some time to build a more complete reproduction, but I'm not sure when I'll have the cycles for that.

bryantbiggs commented 1 month ago

if you are building a derivative of the AL2023 type, then use the AL2023 AMI type - no need to set it to custom

  eks_managed_node_groups = {
    ng = {
      cloudinit_pre_nodeadm = [
        {
          content_type = "text/x-shellscript; charset=\"us-ascii\""
          content = templatefile("${path.module}/userdata.sh.tpl", {
               ...  template variables redacted...
          })
        }
      ]

      ami_type                   = "AL2023_x86_64_STANDARD"
      ami_id                     = "ami-..."
      enable_bootstrap_user_data = true
    }
  }
bryantbiggs commented 1 month ago

@jpriebe does the above work for your use case?

jpriebe commented 1 month ago

@bryantbiggs - just tried this out -- yes, it worked. Thanks for clearing up my misconceptions. I worry a little about people reading my previous comments and not finding their way to the bottom of this thread. Should I edit those previous comments to point people to your post?

chapol-github commented 3 weeks ago

@jpriebe i am also trying to figure out what exactly need to configure to use managed node group with custom ami build from AL2023. can you share example what you have added in user data?

Does this below NodeConfig will be added by module from template al2023_user_data.tpl? Or do we need to add in user data with cloudinit_pre_nodeadm .

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    apiServerEndpoint: https://example.com/
    certificateAuthority: Y2VydGlmaWNhdGVBdXRob3JpdHk=
    cidr: 172.20.0.0/16
    name: eks-cluster
  kubelet:
    config:
      maxPods: 110
      clusterDNS:
      - 172.20.0.10
    flags:
    - "--node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,eks.amazonaws.com/nodegroup-image=ami-123,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=mng,eks.amazonaws.com/sourceLaunchTemplateId=lt-1234"
    - --register-with-taints=CriticalAddonsOnly=true:NoSchedule

--//--

When i am adding a managede nodegroup this way without any user data with cloudinit_pre_nodeadm . I am seeing module is adding linux_user_data.tpl as user data and node is not able to join due to bootstrap file missing or worng format.

eks_managed_node_groups = {
    ng = {
           name   = “eks-mng”
           instance_type   = [“t3.xlarge”]
           ami_type    = "AL2023_x86_64_STANDARD"
           ami_id   = "ami-..."
           enable_bootstrap_user_data = true
    }
  }
jpriebe commented 2 weeks ago

You may be doing something very different, since you have kubelet section in your NodeConfig.

Here is the relevant part of our nodegroup configuration:

    ng = {

      ...

      enable_bootstrap_user_data = true # to opt in to using the module supplied bootstrap user data template

      cloudinit_pre_nodeadm = [
        {
          content_type = "text/x-shellscript; charset=\"us-ascii\""
          content = templatefile("${path.module}/eks-node-userdata.sh.tpl", {
              ...
          })
        }
      ]

      ami_type = "AL2023_x86_64_STANDARD"
      ami_id   = var.eks_ami_id
    }

The template eks-node-userdata.sh.tpl is a template for a simple bash script. It is not a multi-part MIME document.

The module takes care of structuring the userdata appropriately.