pulumi / pulumi-eks

A Pulumi component for easily creating and managing an Amazon EKS Cluster
https://www.pulumi.com/registry/packages/eks/
Apache License 2.0
171 stars 81 forks source link

Changes to role_mappings argument to eks.Cluster result in wiping the podExecutionRole entry in the kube-system/aws-auth ConfigMap #807

Open filip-zyzniewski opened 2 years ago

filip-zyzniewski commented 2 years ago

What happened?

We added some new roles to roleMappings and pulumi update has removed the following entry from the kube-system/aws-auth ConfigMap:

- groups:
  - system:bootstrappers
  - system:nodes
  - system:node-proxier
  rolearn: arn:aws:iam::<account-id>:role/<cluster-name>-podExecutionRole-role-<suffix>
  username: system:node:{{SessionName}}

This mapping is added automatically on Fargate profile creation: https://aws.amazon.com/premiumsupport/knowledge-center/fargate-troubleshoot-profile-creation:

The pod execution role is an IAM role that's used by the Fargate node to make AWS API calls. These include calls made to fetch Amazon Elastic Container Registry (Amazon ECR) images such as VPC CNI, CoreDNS, and so on. The AmazonEKSFargatePodExecutionRolePolicy managed policy must be attached to this role.

Kubelet on the Fargate node uses this IAM role to communicate with the API server. This role must be included in the aws-auth configmap so that kubelet can authenticate with the API server. When you create a Fargate profile, the Fargate workflow automatically adds this role to the cluster's aws-auth configmap.:

Steps to reproduce

create a cluster by doing pulumi up on:

import pulumi_eks as eks

eks.Cluster(
  name,
  fargate=True,
  roleMappings: [
    eks.RoleMappingArgs(
      groups=["read-only"],
      role_arn="arn:aws:iam::<account-id>:role/foo"
      username="aws:foo"
    ),
  ],
});

then add another entry to roleMappings:

import pulumi_eks as eks

eks.Cluster(
  name,
  fargate=True,
  roleMappings: [
    eks.RoleMappingArgs(
      groups=["read-only"],
      role_arn="arn:aws:iam::<account-id>:role/foo"
      username="aws:foo"
    ),
    eks.RoleMappingArgs(
      groups=["read-only"],
      role_arn="arn:aws:iam::<account-id>:role/bar"
      username="aws:bar"
    ),
  ],
});

and run pulumi up again.

Expected Behavior

arn:aws:iam::<account-id>:role/bar entry in the kube-system/aws-auth ConfigMap gets added next to rolearn: arn:aws:iam::<account-id>:role/<cluster-name>-podExecutionRole-role-<suffix> and arn:aws:iam::<account-id>:role/foo entries

Actual Behavior

The arn:aws:iam::<account-id>:role/bar entry is being added, but the rolearn: arn:aws:iam::<account-id>:role/<cluster-name>-podExecutionRole-role-<suffix> is being removed.

Output of pulumi about

CLI          
Version      3.46.0
Go Version   go1.19.3
Go Compiler  gc

Plugins
NAME        VERSION
aws         5.19.0
command     0.5.2
eks         0.42.7
kubernetes  3.22.1
python      unknown
tls         4.6.1

Host     
OS       darwin
Version  12.2
Arch     arm64

This project is written in python: executable='/opt/homebrew/bin/python3' version='3.10.8
'

Current Stack: [...]

TYPE                                                                                        URN
[...]

Found no pending operations associated with [...]

Backend        
Name           pulumi.com
URL            [...]
User           [...]
Organizations  [...]

Dependencies:
NAME            VERSION
pip             22.3.0
pulumi-command  0.5.2
pulumi-eks      0.42.7
pulumi-tls      4.6.1
pydantic        1.10.2
setuptools      65.5.0
wheel           0.37.1

Pulumi locates its logs in /var/folders/xc/rb73zyjd3kq26tv0hpk4wxh40000gq/T/ by default

Additional context

podExecutionRoleArn @ https://github.com/pulumi/pulumi-eks/blob/v0.42.7/nodejs/eks/cluster.ts#L723-L730 is never being added to roleMappings @ https://github.com/pulumi/pulumi-eks/blob/v0.42.7/nodejs/eks/cluster.ts#L678-L691 .

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

squaremo commented 2 years ago

Thank you for this detailed bug report :star: It sounds like this will make Fargate with EKS mostly useless -- is that accurate? If so, I'll bump the priority up.

matthewriedel-flux commented 1 year ago

Thank you for this detailed bug report ⭐ It sounds like this will make Fargate with EKS mostly useless -- is that accurate? If so, I'll bump the priority up.

Yes, it breaks the Fargate node-cluster communication. Workloads will fail to launch and all Fargate nodes go into an "unknown" state.

I was able to repair this when it just happened to me by putting back in the original aws-auth configmap. Thankfully I had done a diff so I had the old values, but I didn't notice until it was too late.

I am using pulumi-eks@1.01, so this problem still exists.

Here's my EKS cluster config:

const cluster = new eks.Cluster(`${pulumi.getStack()}`, {
  vpcId: vpc.id,
  name: eksConfig.require('name'),
  version: eksConfig.require('clusterVersion'),
  enabledClusterLogTypes: ['api', 'audit', 'authenticator', 'controllerManager', 'scheduler'],
  publicSubnetIds: vpc.publicSubnetIds,
  privateSubnetIds: vpc.privateSubnetIds,
  fargate: {
    podExecutionRoleArn: podExecutionRole.arn,
    subnetIds: vpc.privateSubnetIds,
    selectors: [
      // We need to define these so EKS can schedule pods it needs to run
      { namespace: 'kube-system' },
      { namespace: 'default' }
    ]
  },
  createOidcProvider: true
});
pbailey-ipsos commented 1 year ago

Today, my colleague and I encountered this bug as well. This is still an active problem which breaks Fargate on EKS Clusters if you update any role_mapping or user_mapping in the pulumi_eks.Cluster() definition. This is something that will happen any time we add a new developer to our team, which will require us to tear down our entire kubernetes application, just to regenerate the correct config map. This is a pretty devastating bug.

Digging into this in the https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/cluster.ts script, Pulumi tries to overwrite the aws-auth Config Map when the Cluster definition changes. However, when a Fargate Profile is created, AWS will automatically add the pod execution role arn to the aws-auth Config Map. So when Pulumi overwrites this Config Map, they are wiping out the edits written by AWS.

The workaround we ended up on:

Python code:

import pulumi_aws as aws
import pulumi_eks as eks

fargate_role = aws.iam.Role(
    "fargate-role",
    assume_role_policy=json.dumps({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {"Service": f"eks-fargate-pods.amazonaws.com"},
                "Action": "sts:AssumeRole",
            }
        ]})
)

fargate_role_policy = aws.iam.RolePolicyAttachment(
    "fargate-pod-execution-policy",
    policy_arn="arn:aws:iam::aws:policy/AmazonEKSFargatePodExecutionRolePolicy",
    role=fargate_role.name,
)

eks_cluster = eks.Cluster(
    "eks-cluster-name",
    fargate=eks.FargateProfileArgs(pod_execution_role_arn=fargate_role.arn),
    role_mappings=[
        eks.RoleMappingArgs(
            groups=["system:bootstrappers","system:nodes","system:node-proxier"],
            role_arn=fargate_role.arn,
            username="system:node:{{SessionName}}"
        )
    ],
)