Closed AlissonRS closed 4 months ago
You seem to be missing all of the Karpenter components https://github.com/terraform-aws-modules/terraform-aws-eks/blob/098c6a86ca716dae74bd98974accc29f66178c43/examples/karpenter/main.tf#L111-L160
@bryantbiggs no I'm not missing, all of those were added just fine, I just didn't add in my example because I think they are irrelevant here as the issue lies in EKS Node Group permissions, the one used to run the Karpenter Controller pods.
The karpenter module itself exposes a config to attach additional policies, but those are the ones used by the nodes created by Karpenter, it's a different role.
I think you are misunderstanding a few things:
@bryantbiggs the permission issues only got resolved after I added extra permissions to the EKS module for the Karpenter node group like this (see eks_managed_node_groups.iam_role_additional_policies
below, and also the eks_karpenter_controller_policy
):
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.13.1"
cluster_name = var.cluster_name
cluster_version = var.cluster_version
cluster_endpoint_public_access = true
enable_cluster_creator_admin_permissions = false
kms_key_enable_default_policy = true
eks_managed_node_groups = {
karpenter_group = {
instance_types = ["t3.small"]
subnet_ids = module.vpc.private_subnets
# These extra permissions are required by Karpenter Controller pods
iam_role_additional_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
AmazonEC2FullAccess = "arn:aws:iam::aws:policy/AmazonEC2FullAccess"
additional = aws_iam_policy.eks_karpenter_controller_policy.arn
}
min_size = 2
max_size = 3
desired_size = 2
capacity_type = "SPOT"
taints = {
# This Taint aims to keep just EKS Addons and Karpenter running on this MNG
# The pods that do not tolerate this taint should run on nodes created by Karpenter
addons = {
key = "CriticalAddonsOnly"
value = "true"
effect = "NO_SCHEDULE"
},
}
}
}
cluster_addons = {
coredns = {
most_recent = true
}
kube-proxy = {
most_recent = true
}
eks-pod-identity-agent = {
most_recent = true
}
aws-ebs-csi-driver = {
most_recent = true
service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
}
vpc-cni = {
most_recent = true
}
}
vpc_id = module.vpc.vpc_id
subnet_ids = concat(module.vpc.private_subnets, module.vpc.intra_subnets)
control_plane_subnet_ids = concat(module.vpc.private_subnets, module.vpc.intra_subnets)
tags = merge(local.common_tags, {
# NOTE - if creating multiple security groups with this module, only tag the
# security group that Karpenter should utilize with the following tag
# (i.e. - at most, only one security group should have this tag in your account)
"karpenter.sh/discovery" = var.cluster_name
})
}
resource "aws_iam_policy" "eks_karpenter_controller_policy" {
name = "Karpenter-controller-${var.cluster_name}-policy"
path = "/"
description = "Additional policies attached to the Karpenter Controller which runs on EKS Node Group."
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"pricing:*",
"iam:*",
]
Effect = "Allow"
Resource = "*"
},
]
})
tags = local.common_tags
}
I will also share the extra components you said I'm missing, but I just didn't add because I don't think the problem is related to them:
module "karpenter" {
source = "terraform-aws-modules/eks/aws//modules/karpenter"
version = "~> 20.13.1"
cluster_name = module.eks.cluster_name
enable_pod_identity = true
create_pod_identity_association = true
node_iam_role_additional_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
tags = local.common_tags
}
# This will install karpenter on the EKS cluster
resource "helm_release" "karpenter" {
namespace = "karpenter"
create_namespace = true
name = "karpenter-${var.cluster_name}"
repository = "oci://public.ecr.aws/karpenter"
chart = "karpenter"
version = "0.37.0"
values = [
<<-EOT
settings:
clusterName: ${module.eks.cluster_name}
clusterEndpoint: ${module.eks.cluster_endpoint}
EOT
]
depends_on = [
module.eks.cluster_id
]
}
@bryantbiggs let me know if you also want me to share my NodePools and EC2NodeClasses.
@bryantbiggs you seem to be misunderstanding my report.
There are two IAM Roles created by the whole setup:
1) The one attached to the EKS Node Group where Karpenter Controller pods run 2) The one used by the nodes created by Karpenter to run other workload
The error logs I showed are from the Karpenter Controller, which is missing some permissions.
The only way I managed to fix this, was by attaching the extra permissions required by the Karpenter Controller in the EKS module which is attached to the EKS Node Group, so not the Karpenter module.
There are two IAM Roles created by the whole setup:
False - there are three roles in your setup.
you are giving the node IAM role the permissions, which means anything that runs on the nodes will inherit those permissions - this is not correct.
Are you converting an existing Karpenter installation from IRSA to EKS Pod Identity?
@bryantbiggs the other role is unrelated to Karpenter, the one affecting my setup is the second one (Karpenter controller IAM role).
When I deployed everything, it originally comes with the policies below:
As you can see in my logs shared earlier, the karpenter controller pods are failing to some permissions missing:
I'm not exactly converting, my old cluster is based on IRSA but I created a brand new VPC + EKS + IAM setup everything from scratch, I thought that'd be easier than trying to migrate an existing cluster. So all the roles, vpc + subnets, eks, everything is brand new.
The old cluster runs Karpenter on Fargate, but since Fargate doesn't seem to support Pod Identity, then we followed the new example which uses EKS Node Group to run the Karpenter Controller.
@bryantbiggs I also understand giving permissions to the nodes is not the best way, as that means all the other cluster addons that run on the same node will inherit those permissions.
I did this just to confirm this is what was missing so Karpenter Controller would work and be able to create nodes (which it did).
So now I would like to learn how to properly give permissions only to the Karpenter pods (if that's possible through Pod Identity), but still doesn't change the fact permissions were missing.
I'm wondering if those extra permissions are only required if using SPOT instances for the EKS Node Group? :thinking:
Once the cluster looks good to go live, we shall switch back to "on-demand" with reserved instances.
can you try this pattern - I believe its closest to what you are trying to do https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/patterns/karpenter-mng
@bryantbiggs thanks, I just read the README.md and went through the setup example. It looks very similar to the example in this repo btw.
The Karpenter Controller IAM Role has been created with all the permissions that the Karpenter Controller pods were missing, so it seems like the Karpenter Controller pods can't assume this role, otherwise they shouldn't complain about those permissions.
I'm investigating what's missing, as my karpenter submodule looks exactly like both examples.
@bryantbiggs I found the issue in my setup.
The karpenter submodule by default uses "karpenter" as service account for the Pod Identity Association if we don't provide it (like the example).
My helm_release for installing karpenter on the cluster was named "karpenter-mycluster", which is used for creating the service_account in the cluster, so the pods can't get the permission due to service account name mismatch. The example is hardcoded as "karpenter" (which matches the Pod Identity Association).
This can be easily overlooked as you wouldn't think the helm release name matters.
Since it can't be a random string as it must match the exact service_account name used by the karpenter submodule for creating the Pod Identity Association, I opened a PR to update it to use module.karpenter.service_account
, this "link" makes it more clear to users (like me who tends to change names) that the name must match the service account name from karpenter submodule. This could have saved me a few days of investigation.
Thanks for your help and patience explaining things to me about the roles :pray:
This issue has been resolved in version 20.14.0 :tada:
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
I just launched an EKS cluster using the new access entry permission setup, following the Karpenter example.
The Karpenter pods will throw errors like these:
So it's missing permissions in the EKS Node Group created to run Karpenter, for example:
My Karpenter module looks like this:
I know I can pass additional permissions to the EKS Node Group using the
iam_role_additional_policies
, but shouldn't the minimum setup already come with all permissions required by Karpenter (maybe apart from spot related permissions), or am I missing something?