Open filip-zyzniewski opened 2 years ago
Thank you for this detailed bug report :star: It sounds like this will make Fargate with EKS mostly useless -- is that accurate? If so, I'll bump the priority up.
Thank you for this detailed bug report ⭐ It sounds like this will make Fargate with EKS mostly useless -- is that accurate? If so, I'll bump the priority up.
Yes, it breaks the Fargate node-cluster communication. Workloads will fail to launch and all Fargate nodes go into an "unknown" state.
I was able to repair this when it just happened to me by putting back in the original aws-auth
configmap. Thankfully I had done a diff so I had the old values, but I didn't notice until it was too late.
I am using pulumi-eks@1.01
, so this problem still exists.
Here's my EKS cluster config:
const cluster = new eks.Cluster(`${pulumi.getStack()}`, {
vpcId: vpc.id,
name: eksConfig.require('name'),
version: eksConfig.require('clusterVersion'),
enabledClusterLogTypes: ['api', 'audit', 'authenticator', 'controllerManager', 'scheduler'],
publicSubnetIds: vpc.publicSubnetIds,
privateSubnetIds: vpc.privateSubnetIds,
fargate: {
podExecutionRoleArn: podExecutionRole.arn,
subnetIds: vpc.privateSubnetIds,
selectors: [
// We need to define these so EKS can schedule pods it needs to run
{ namespace: 'kube-system' },
{ namespace: 'default' }
]
},
createOidcProvider: true
});
Today, my colleague and I encountered this bug as well. This is still an active problem which breaks Fargate on EKS Clusters if you update any role_mapping or user_mapping in the pulumi_eks.Cluster() definition. This is something that will happen any time we add a new developer to our team, which will require us to tear down our entire kubernetes application, just to regenerate the correct config map. This is a pretty devastating bug.
Digging into this in the https://github.com/pulumi/pulumi-eks/blob/master/nodejs/eks/cluster.ts script, Pulumi tries to overwrite the aws-auth Config Map when the Cluster definition changes. However, when a Fargate Profile is created, AWS will automatically add the pod execution role arn to the aws-auth Config Map. So when Pulumi overwrites this Config Map, they are wiping out the edits written by AWS.
The workaround we ended up on:
Python code:
import pulumi_aws as aws
import pulumi_eks as eks
fargate_role = aws.iam.Role(
"fargate-role",
assume_role_policy=json.dumps({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"Service": f"eks-fargate-pods.amazonaws.com"},
"Action": "sts:AssumeRole",
}
]})
)
fargate_role_policy = aws.iam.RolePolicyAttachment(
"fargate-pod-execution-policy",
policy_arn="arn:aws:iam::aws:policy/AmazonEKSFargatePodExecutionRolePolicy",
role=fargate_role.name,
)
eks_cluster = eks.Cluster(
"eks-cluster-name",
fargate=eks.FargateProfileArgs(pod_execution_role_arn=fargate_role.arn),
role_mappings=[
eks.RoleMappingArgs(
groups=["system:bootstrappers","system:nodes","system:node-proxier"],
role_arn=fargate_role.arn,
username="system:node:{{SessionName}}"
)
],
)
What happened?
We added some new roles to roleMappings and pulumi update has removed the following entry from the kube-system/aws-auth ConfigMap:
This mapping is added automatically on Fargate profile creation: https://aws.amazon.com/premiumsupport/knowledge-center/fargate-troubleshoot-profile-creation:
Steps to reproduce
create a cluster by doing
pulumi up
on:then add another entry to
roleMappings
:and run
pulumi up
again.Expected Behavior
arn:aws:iam::<account-id>:role/bar
entry in the kube-system/aws-auth ConfigMap gets added next torolearn: arn:aws:iam::<account-id>:role/<cluster-name>-podExecutionRole-role-<suffix>
andarn:aws:iam::<account-id>:role/foo
entriesActual Behavior
The
arn:aws:iam::<account-id>:role/bar
entry is being added, but therolearn: arn:aws:iam::<account-id>:role/<cluster-name>-podExecutionRole-role-<suffix>
is being removed.Output of
pulumi about
Additional context
podExecutionRoleArn @ https://github.com/pulumi/pulumi-eks/blob/v0.42.7/nodejs/eks/cluster.ts#L723-L730 is never being added to roleMappings @ https://github.com/pulumi/pulumi-eks/blob/v0.42.7/nodejs/eks/cluster.ts#L678-L691 .
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).