pulumi / pulumi-eks

A Pulumi component for easily creating and managing an Amazon EKS Cluster
https://www.pulumi.com/registry/packages/eks/
Apache License 2.0
171 stars 80 forks source link

Custom launch template is not used when creating a new Managed Node Group #633

Closed aureq closed 4 weeks ago

aureq commented 2 years ago

Hello!

Issue details

When creating a new Managed Node Group, I specified a custom (ec2) launch template via launchTemplate.

Though, newly launched EC2 instances do not appear to be using this launch template since the EC2 instance tag name aws:ec2launchtemplate:id refers to the one created by this provider instead.

Steps to reproduce

  1. Use https://github.com/pulumi/pulumi-eks/tree/master/examples/managed-nodegroups as a starting point
  2. Create a new launch template as part of your code
    const launchTemplate = new aws.ec2.LaunchTemplate("my-launch-template", {
    tags: {testTag: "tag value"},
    });
  3. Set the launch template for the managed node group like this
    ...
    launchTemplate: {
        id: launchTemplate.id,
        version: '$Latest'
    }
  4. Deploy your changes

Expected: The custom launch template is used to launch new EC2 instances. Actual: The default launch template created by this provider is used.

con5cience commented 2 years ago

id is an Output, so you need to interpolate it:

id: pulumi.interpolate`${launchTemplate.id}`

More context:

https://www.pulumi.com/registry/packages/aws/api-docs/ec2/launchtemplate/#outputs https://www.pulumi.com/docs/intro/concepts/inputs-outputs/#outputs-and-strings

It is not recommended to use '$Latest' for the launch template version because the AWS API will this as 1 and parse it as drift every time, causing Pulumi to delete-replace it.

lukehoban commented 2 years ago
const launchTemplate = new aws.ec2.LaunchTemplate("my-launch-template", {
    tags: {testTag: "tag value"},
});

This tags the launch template, but does not tag the instances created by the launch template. To tag the instances created by the launch template, you can do:

const launchTemplate = new aws.ec2.LaunchTemplate("my-launch-template", {
    tagSpecifications: [
        { resourceType: "instance", tags: { testTag: "tag value" } },
    ],
});
nimbinatus commented 2 years ago

Reopening this with a question from the community Slack:

Attaching a custom LaunchTemplate to an EKS ManagedNodeGroup doesn't seem to work? For example, following this: https://github.com/pulumi/pulumi-eks/tree/master/examples/managed-nodegroups. I create a new LaunchTemplate with some metadata options and a key pair, refer to it in the eks.createManagedNodeGroup() args:

launchTemplate: {
 id: pulumi.interpolate`${myLaunchTemplate.id}`,
 version: "1",
},

When the node group comes up, it says on the EKS page that it's using mine, but on the instances themselves in the ASG, it's using an auto-created one. Is this a bug? Or am I missing something fundamental?

johnharris85 commented 2 years ago

Same issue:

  const localCluster = new eks.Cluster(`localCluster`, {
    name: `localCluster`,
    version: "1.21",
    vpcId: vpc.id,
    publicSubnetIds: vpc.publicSubnetIds,
    privateSubnetIds: vpc.privateSubnetIds,
    nodeAssociatePublicIpAddress: false,
    endpointPrivateAccess: true,
    endpointPublicAccess: true,
    createOidcProvider: true,
    clusterSecurityGroup: apiSg,
    skipDefaultNodeGroup: true,
    providerCredentialOpts: {
      profileName: aws.config.profile,
    },
  },);

  const localEKSLaunchTemplate = new aws.ec2.LaunchTemplate(`localEKSLaunchTemplate`, {
    metadataOptions: {
      httpEndpoint: "enabled",
      httpTokens: "required",
      httpPutResponseHopLimit: 2,
    },
    keyName: keyName,
    defaultVersion: 1,
  })

  const localClusterMNG = new eks.ManagedNodeGroup(`localClusterMNG`, {
    version: "1.21",
    cluster: localCluster,
    nodeRole: localCluster.core.instanceRoles[0],
    subnetIds: vpc.privateSubnetIds,
    scalingConfig: {
      minSize: 1,
      desiredSize: 2,
      maxSize: 25,
    },
    launchTemplate: {
      id: localEKSLaunchTemplate.id,
      version: pulumi.interpolate`${localEKSLaunchTemplate.latestVersion}`,
    },
  }, {ignoreChanges: ["scalingConfig"]})

The launch template is created, and on the EKS dashboard it says it's being used for the node group, however when looking at the actual EC2 instances / ASG that are part of the node group, they all show the default EKS launch template.

sushantkumar-amagi commented 2 years ago

Stumbled upon this issue. I have been developing using python and the following function works for me. Hopefully this helps people resolve their issues. Difference here being I associate an EKS AMI and the SG created by the cluster.

def create_launch_template(stack, cluster, node_group, k8s_version):
    ami_id = fetch_latest_ami_id(k8s_version)

    launch_template_name = f"{stack}-{node_group.get('name')}-lt"
    eks_sg = cluster.core.cluster.vpc_config.cluster_security_group_id

    complete_user_data = (
        user_data.SCRIPT_FORMAT
        + node_group.get("bootstrap_commands")
        + user_data.SCRIPT_BOUNDARY_END
        + user_data.BASE_USER_DATA
    )

    launch_template_device_mapping_args = LaunchTemplateBlockDeviceMappingArgs(
        device_name="/dev/xvda",
        ebs=LaunchTemplateBlockDeviceMappingEbsArgs(
            volume_size=100,
        ),
    )

    tag_pairs = {
        "eks_cluster": cluster.eks_cluster.name,
        "launch_template_name": launch_template_name,
        "node_group": node_group.get("name"),
        "Stack": stack,
    }

    logger.info(f"iam#create_launch_template Creating Launch Template {launch_template_name}")
    launch_template = LaunchTemplate(
        launch_template_name,
        name=launch_template_name,
        block_device_mappings=[launch_template_device_mapping_args],
        user_data=format_user_data(cluster, complete_user_data),
        image_id=ami_id,
        vpc_security_group_ids=[eks_sg],
        tags=tag_pairs,
        tag_specifications=[
            LaunchTemplateTagSpecificationArgs(
                resource_type="instance",
                tags=tag_pairs,
            )
        ],
    )

    return launch_template
johnharris85 commented 2 years ago

Thanks @sushantkumar-amagi:

I see that you're using tags, and @lukehoban talks about tags in his message above, but are tags a necessary piece of getting this to work? I can't think why that would be the case, and can't find anything in the AWS docs about that? (although certainly happy to be wrong :smile: )

sushantkumar-amagi commented 2 years ago

HI @johnharris85

Also I dont think tags are absolutely essential for this to work, infact I had forgotten to tag them before coming across this issue.

johnharris85 commented 2 years ago

Thanks for the response @sushantkumar-amagi. OK so I've done some more testing with this, and it's actually pretty weird (or maybe I'm misunderstanding how EKS does Node Groups / custom launch templates?)

I create an EKS cluster with 2 MNGs. One specifies a launch template, the other doesn't. Both of the MNGs get created. In the EKS console I can see that MNG 1 is using my custom LT, MNG 2 has no LT. So far so good.

Now when I visit the Autoscaling groups for the nodes for each MNG, the ASG for MNG 1 has a launchtemplate that is created by Pulumi/EKS, not my custom one. However, the auto-created / attached LT does have the configuration from my custom LT (SSH key, other settings, etc...). Maybe it's copied over during creation? So the whole process is obviously aware of my LT. This is fine for a one shot, but if I ever want to go and update the LT in pulumi and apply it then it will have no effect as the ASGs are using the auto-created LT with the configuration from my original run of the custom LT creation.

I wonder if others are actually hitting this issue and they're just not noticing because the config works as expected (copied over) and they never update their original LT so don't notice changes aren't being propagated?

yann-soubeyrand commented 2 years ago

Hello,

EKS copies the launch template one gives to him (it seems to add some default settings): https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html

Managed node groups are always deployed with a launch template to be used with the Amazon EC2 Auto Scaling group. The Amazon EKS API creates this launch template either by copying one you provide or by creating one automatically with default values in your account.

johnharris85 commented 2 years ago

Thanks @yann-soubeyrand, the behavior I'm seeing makes sense then, although I'm wondering how Pulumi handles when we update the template then, do the changes also get copied, and version numbers?

yann-soubeyrand commented 2 years ago

@johnharris85 when you specify a launch template for your managed node group, you indicate its version. When you update the version, EKS automatically updates its copy and does a rolling replace of the nodes.

johnharris85 commented 2 years ago

Pretty sure when I tested this Pulumi was not picking up updates, but I will re-test. Thanks!

markfickett commented 1 year ago

I was able to get a ManagedNodeGroup working with a custom LaunchTemplate in Python. Below is what's working for me.

It takes AWS about 15 minutes to update the node group (of 2 nodes) when I change the user data. New nodes start and join the group/cluster within about 3 minutes, but it takes longer for the pods to get rescheduled and the old nodes to terminate.

$ pulumi about
CLI
Version      3.46.1
Go Version   go1.19.2
Go Compiler  gc

Plugins
NAME        VERSION
aws         5.7.2
eks         0.42.7
honeycomb   0.0.11
kubernetes  3.23.1
python      3.10.8
_aws_account_id = aws.get_caller_identity().account_id

_K8S_VERSION = "1.23"  # latest visible in pulumi-eks

_NODE_ROOT_VOLUME_SIZE_GIB = 60
# Script to run on EKS nodes as root before EKS bootstrapping (which starts the kubelet)
# default bootstrap: https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh
# This user data must be in mime format when passed to a launch template.
# https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html
#
# From MNG launch template docs:
# "your user data is merged with Amazon EKS user data required for nodes to join the
# cluster. Don't specify any commands in your user data that starts or modifies kubelet."
# Inspecting instance user data shows this and the original user data in separate MIME
# parts, both in the user data with this 1st.
_NODE_USER_DATA = r"""#!/bin/bash
set -e

eho "Doing my custom setup, kubelet will start next."
"""

_USER_DATA_MIME_HEADER = """MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"
"""

_USER_DATA_MIME_FOOTER = """

--//--
"""

def _wrap_and_encode_user_data(script_text: str) -> str:
    mime_encapsulated = _USER_DATA_MIME_HEADER + script_text + _USER_DATA_MIME_FOOTER
    encoded_bytes = base64.b64encode(mime_encapsulated.encode())
    return encoded_bytes.decode("latin1")

def _define_cluster_and_get_provider() -> Tuple[eks.Cluster, k8s.Provider]:
    # https://www.pulumi.com/docs/guides/crosswalk/aws/eks/
    # https://www.pulumi.com/registry/packages/eks/api-docs/cluster/#cluster

    # Map AWS IAM users to Kubernetes internal RBAC admin group. Mapping individual
    # users avoids having to go from a group to a role with assume-role policies.
    # Kubernetes has its own permissions (RBAC) system, with predefined groups for
    # common permissions levels. AWS EKS provides translation from IAM to that, but we
    # must explicitly map particular users or roles that should be granted permissions
    # within the cluster.
    #
    # AWS docs: https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
    # Detailed example: https://apperati.io/articles/managing_eks_access-bs/
    # IAM groups are not supported, only users or roles:
    #     https://github.com/kubernetes-sigs/aws-iam-authenticator/issues/176
    user_mappings = []
    for username in TEAM_MEMBERS:
        user_mappings.append(
            eks.UserMappingArgs(
                # AWS IAM user to set permissions for
                user_arn=f"arn:aws:iam::{_aws_account_id}:user/{username}",
                # k8s RBAC group from which this IAM user will get permissions
                groups=["system:masters"],
                # k8s RBAC username to create for the user
                username=username,
            )
        )

    node_role = _define_node_role(EKS_CLUSTER_NAME)

    cluster = eks.Cluster(
        EKS_CLUSTER_NAME,
        name=EKS_CLUSTER_NAME,
        version=_K8S_VERSION,
        vpc_id=_CLUSTER_VPC,
        subnet_ids=_CLUSTER_SUBNETS,
        # OpenID Connect Provider maps from k8s to AWS IDs.
        # Get the OIDC's ID with:
        # aws eks describe-cluster --name <CLUSTER_NAME> --query "cluster.identity.oidc.issuer" --output text
        create_oidc_provider=True,
        user_mappings=user_mappings,
        skip_default_node_group=True,
        instance_role=node_role,
    )
    # Export the kubeconfig to allow kubectl to access the cluster. For example:
    #    pulumi stack output my-kubeconfig > kubeconfig.yml
    #    KUBECONFIG=./kubeconfig.yml kubectl get pods -A
    pulumi.export(f"my-kubeconfig", cluster.kubeconfig)

    # Work around cluster.provider being the wrong type for Namespace to use.
    # https://github.com/pulumi/pulumi-eks/issues/662
    provider = k8s.Provider(
        f"my-cluster-provider",
        kubeconfig=cluster.kubeconfig.apply(lambda k: json.dumps(k)),
    )

    launch_template = aws.ec2.LaunchTemplate(
        f"{EKS_CLUSTER_NAME}-launch-template",
        block_device_mappings=[
            aws.ec2.LaunchTemplateBlockDeviceMappingArgs(
                device_name="/dev/xvda",
                ebs=aws.ec2.LaunchTemplateBlockDeviceMappingEbsArgs(
                    volume_size=_NODE_ROOT_VOLUME_SIZE_GIB,
                ),
            ),
        ],
        user_data=_wrap_and_encode_user_data(
            _NODE_USER_DATA
        ),
        # The default version shows up first in the UI, so update it even though
        # we don't really need to since we use latest_version below.
        update_default_version=True,
        # Other settings, such as tags required for the node to join the group/cluster,
        # are filled in by default.
    )

    # The EC2 instances that the cluster will use to execute pods.
    # https://www.pulumi.com/registry/packages/eks/api-docs/managednodegroup/
    eks.ManagedNodeGroup(
        f"{EKS_CLUSTER_NAME}-managed-node-group",
        node_group_name=f"{EKS_CLUSTER_NAME}-managed-node-group",
        cluster=cluster.core,
        version=_K8S_VERSION,
        subnet_ids=_CLUSTER_SUBNETS,
        node_role=node_role,
        instance_types=["r6i.2xlarge"],
        scaling_config=aws.eks.NodeGroupScalingConfigArgs(
            min_size=1,
            desired_size=2,
            max_size=4,
        ),
        launch_template={
            "id": launch_template.id,
            "version": launch_template.latest_version,
        },
    )

    return cluster, provider
sudosoul commented 1 year ago

It'd be helpful if the docs were updated as well to define NodeGroupLaunchTemplateArgs https://www.pulumi.com/registry/packages/eks/api-docs/managednodegroup/#nodegrouplaunchtemplate

bhvishal9 commented 5 months ago

Facing the same issue, created a launch template but it is not being applied to the managed node group with the latest packages.

flostadler commented 4 months ago

@bhvishal9 Sorry that you ran into those issues! There was a regression in the handling of custom launch template https://github.com/pulumi/pulumi-eks/issues/1193. This is fixed in v2.7.1 now. Can you please update to that version and check whether it's working for you?

syscl commented 3 months ago

Hi pulumi community, we also have this problem using latest pulumi with latest python libraries:

import pulumi
import pulumi_aws as aws
import pulumi_awsx as awsx
import pulumi_kubernetes as kubernetes
from pulumi.resource import ResourceOptions

aws_provider = aws.Provider(
        "aws",
        region=aws_region,
        assume_role=aws.ProviderAssumeRoleArgs(
            role_arn=cluster_props.customer_assume_role_arn,
            session_name="PulumiSession",
        ),
        default_tags=aws.ProviderDefaultTagsArgs(tags=tags),
    )

cluster = aws.eks.Cluster(
        cluster_name,
        role_arn=eks_role_iam.arn,
        vpc_config=aws.eks.ClusterVpcConfigArgs(
            subnet_ids=vpc.private_subnet_ids,
            endpoint_public_access=True,  # Enable public endpoint
            public_access_cidrs=vpc_cidrs_with_nat_gateways_eips,  # Restrict public access
        ),
        opts=ResourceOptions(provider=aws_provider),
    )

launch_template = aws.ec2.LaunchTemplate(
        "eks-node-group-launch-template",
        instance_type=cluster_props.ec2_instance_type,
        block_device_mappings=[
            aws.ec2.LaunchTemplateBlockDeviceMappingArgs(
                device_name="/dev/xvda",
                ebs=aws.ec2.LaunchTemplateBlockDeviceMappingEbsArgs(
                    volume_size=ec2_instance_disk_size,
                    volume_type="gp2",
                ),
            )
        ],
        metadata_options=aws.ec2.LaunchTemplateMetadataOptionsArgs(
            http_put_response_hop_limit=2,
            http_endpoint="enabled",
            http_tokens="required",
        ),
        opts=ResourceOptions(provider=aws_provider),
    )

nodegroup=aws.eks.NodeGroup(
                "standard_node_group",
                cluster_name=cluster_name,
                node_role_arn=instance_role.arn,
                subnet_ids=vpc.private_subnet_ids,
                scaling_config=aws.eks.NodeGroupScalingConfigArgs(
                    desired_size=num_nodes,
                    min_size=num_nodes,
                    max_size=num_nodes,
                ),
                launch_template=aws.eks.NodeGroupLaunchTemplateArgs(
                    version=launch_template.latest_version.apply(lambda v: str(v)),
                    id=launch_template.id,
                ),
                opts=ResourceOptions(provider=aws_provider),
            ),

This is a very simple provision script, however we are seeing two launch templates got created in aws console:

image

Where the first launch template should never be created. Would appreciate if anyone can shade some lights on this issue.

yann-soubeyrand commented 3 months ago

@syscl it seems to be working as expected: you created a launch template and created a node group using it. EKS then created a launch template based on yours and used it (I guess) in the auto-scaling group it created for the node group. Note that the creator of the launch template is the EKS service linked role.

syscl commented 3 months ago

Thanks for this information @yann-soubeyrand , do you know why a default nodegroup will be created even there's no instance linked to it? For managed node group only one launch template will be created.

yann-soubeyrand commented 3 months ago

Hi @syscl, I'm not sure to understand your question: do you mean that when you create a managed node group without passing it a custom launch template, only one lauch template is created?

flostadler commented 4 weeks ago

@syscl it doesn't seem like you're using pulumi-aws and not pulumi-eks. The second launch template you're seeing was indeed created by the AWS EKS service itself, so doesn't seem to be related to Pulumi directly.

For an example of a managed node group using a custom launch template you can have a look here: https://github.com/pulumi/pulumi-eks/blob/d4f539f99700f3cd084da85f3dad59ec018a2dc0/examples/custom-managed-nodegroup/index.ts#L42-L74