terraform-aws-modules / terraform-aws-iam

Terraform module to create AWS IAM resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/iam/aws
Apache License 2.0
779 stars 985 forks source link

K8s service-accounts are missing the eks.amazonaws.com/role-arn tag and without it, the cluster autoscaler crashes #485

Closed meyerkev closed 4 months ago

meyerkev commented 4 months ago

Description

As per https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.htmll, K8s Service Accounts in AWS that use IRSA depend on having a annotation "eks.amazonaws.com/role-arn" = or the IRSA integration breaks consistently and repeatedly.

  1. First, none of the examples include this required tag.
  2. Second, this is poorly documented.
  3. Third, we provide no magic ability to do this annotation via IRSA.

Versions

Reproduction Code [Required]

This is me doing my best to recreate https://github.com/meyerkev/eks-tf-interview-template/blob/main/terraform/helm/helm.tf in a saner way

Required vars: cluster_name, oidc_provider

resource "kubernetes_namespace" "namespace" {
    metadata {
        name = "cluster-autoscaler"
    }
}

resource "kubernetes_service_account" "service_account" {
    metadata {
        name      = "cluster-autoscaler"
        namespace = "cluster-autoscaler"
        annotations = {
          # With this line, it works.  Without it, the cluster-autoscaler goes into crash-loop backoff
          "eks.amazonaws.com/role-arn" = local.irsa_roles[each.key]
        }
    }
    depends_on = [ kubernetes_namespace.namespace ]
}

module "aws-cluster-autoscaler-irsa" {
    source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"

    attach_cluster_autoscaler_policy = true
    cluster_autoscaler_cluster_names = [var.eks_cluster_name]

    oidc_providers = {
        main = {
            provider_arn               = data.aws_ssm_parameter.oidc_provider.value
            namespace_service_accounts = ["cluster-autoscaler:cluster-autoscaler"]
        }
    }
    role_name = "aws-cluster-autoscaler-${var.eks_cluster_name}-role"
}

resource "helm_release" "cluster-autoscaler" {
    name = "cluster-autoscaler"
    repository = "https://kubernetes.github.io/autoscaler"
    chart = "cluster-autoscaler"
    namespace = cluster-autoscaler
    version = "9.37.0"

    # Surprisingly worthless because it keeps flapping
    wait = true

    set {
        name = "autoDiscovery.clusterName"
        value = var.eks_cluster_name
    }

    set {
        name = "awsRegion"
        value = var.aws_region
    }

    set {
        name = "rbac.serviceAccount.create"
        value = "false"
    }

    set {
        name = "rbac.serviceAccount.name"
        value = local.service_accounts.cluster-autoscaler
    }
    depends_on = [ kubernetes_service_account.service_account ]
}

Steps to reproduce the behavior:

No

Yes

I went back into that repository for the first time in months (without the annotations), spun up a new cluster, and started getting permissions issues with my cluster-autoscaler

Then I eventually found https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html, ran the kubectl describe at the bottom, and noticed it wasn't annotated.

Then I added the annotation and the pod suddenly started working.

Expected behavior

Ideally, it would keep working without the annotations as it had been for the years prior.

Or this behavior would be understood and documented with standard workarounds.

Actual behavior

The k8s cluster-autoscaler pod crashed because the ServiceAccount and the node ServiceAccount both lacked autoscaler permissions because they weren't picking up their IAM roles in AWS.

Terminal Output Screenshot(s)

So with the annotation enabled:

image

But then I comment out that line, run a terraform destroy, reapply my terraform and:

image

So I add the annotations back in, destroy/apply to force the regeneration of the mounted service account, and...

image

With an annotated ServiceAccount

image

Additional context

bryantbiggs commented 4 months ago

this sub-module provides just the role - it does not create K8s resources nor does modify them. there are multiple ways that users may elect to create their namespace and service accounts, with Helm being by far the most common solution. this is not related to this module so closing out

github-actions[bot] commented 3 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.