kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.94k stars 1.46k forks source link

failed to retrieve credentials caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity #1935

Closed kaykhancheckpoint closed 3 years ago

kaykhancheckpoint commented 3 years ago

I am using the helm chart to install the aws load balancer controller.

https://github.com/aws/eks-charts/tree/master/stable/aws-load-balancer-controller

However when i apply the ingress controller i get the following error:

It looks like it is missing a permission, but the role i have created has the correct policy attached https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.1.2/docs/install/iam_policy.json

Can you check below if i am creating the correct role? as i was unsure about this bit

kubectl describe ing -n echoserver echoserver
Name:             echoserver
Namespace:        echoserver
Address:          
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host            Path  Backends
  ----            ----  --------
  echo.geeiq.com  
                  /   echoserver:80 (10.0.1.188:8080)
Annotations:      alb.ingress.kubernetes.io/scheme: internet-facing
                  alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
                  kubernetes.io/ingress.class: alb
Events:
  Type     Reason            Age   From     Message
  ----     ------            ----  ----     -------
  Warning  FailedBuildModel  39s   ingress  Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 40c9e27b-af7b-4e19-9ced-7fa46cbb7526
  Warning  FailedBuildModel  39s  ingress  Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 3003e79e-2585-4b36-9ac9-b2a1a8e961ce
  Warning  FailedBuildModel  39s  ingress  Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 6cae72f1-2a38-47a0-a485-685b9abfe451
  Warning  FailedBuildModel  38s  ingress  Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 7138d1ae-5dba-4604-8f5c-66ad2f2e5ba2
  Warning  FailedBuildModel  38s  ingress  Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 4f08528f-0f75-4e9b-a4eb-4148703d4560
  Warning  FailedBuildModel  38s  ingress  Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 09622f8c-d9ae-485a-a6fd-af6111a23d7c
  Warning  FailedBuildModel  37s  ingress  Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: e42cbe7c-209f-429a-bbaa-c6e056eae69d
  Warning  FailedBuildModel  36s  ingress  Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 65ee4a19-6d94-4dc6-b90a-b73330cd579d
  Warning  FailedBuildModel  36s  ingress  Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 3a2c6830-db8d-42fd-b347-8b7caef77964
  Warning  FailedBuildModel  16s (x4 over 34s)  ingress  (combined from similar events): Failed build model due to couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
  status code: 403, request id: d3477675-9e44-4104-9539-63b8e017fc56

values.yml

# Default values for aws-load-balancer-controller.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

image:
  repository: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller
  tag: v2.1.3
  pullPolicy: IfNotPresent

imagePullSecrets: []
nameOverride: "kube-system"
fullnameOverride: ""

# The name of the Kubernetes cluster. A non-empty value is required
clusterName: "geeiq-prod-k8s"

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<redacted>:role/AWSLoadBalancerControllerIAMRole
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name: "aws-load-balancer-controller"

rbac:
  # Specifies whether rbac resources should be created
  create: true

podSecurityContext:
  fsGroup: 65534

securityContext:
  # capabilities:
  #   drop:
  #   - ALL
  readOnlyRootFilesystem: true
  runAsNonRoot: true
  allowPrivilegeEscalation: false

# Time period for the controller pod to do a graceful shutdown
terminationGracePeriodSeconds: 10

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

# Leverage a PriorityClass to ensure the controller will survive resource shortages
# ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""

nodeSelector: {}

tolerations: []

affinity: {}

podAnnotations: {}

podLabels: {}

# Enable cert-manager
enableCertManager: false

# The ingress class this controller will satisfy. If not specified, controller will match all
# ingresses without ingress class annotation and ingresses of type alb
ingressClass: alb

# The AWS region for the kubernetes cluster. Set to use KIAM or kube2iam for example.
region:

# The VPC ID for the Kubernetes cluster. Set this manually when your pods are unable to use the metadata service to determine this automatically
vpcId:

# Maximum retries for AWS APIs (default 10)
awsMaxRetries:

# If enabled, targetHealth readiness gate will get injected to the pod spec for the matching endpoint pods (default true)
enablePodReadinessGateInject:

# Enable Shield addon for ALB (default true)
enableShield:

# Enable WAF addon for ALB (default true)
enableWaf:

# Enable WAF V2 addon for ALB (default true)
enableWafv2:

# Maximum number of concurrently running reconcile loops for ingress (default 3)
ingressMaxConcurrentReconciles:

# Set the controller log level - info(default), debug (default "info")
logLevel:

# The address the metric endpoint binds to. (default ":8080")
metricsBindAddr: ""

# The TCP port the Webhook server binds to. (default 9443)
webhookBindPort:

# Maximum number of concurrently running reconcile loops for service (default 3)
serviceMaxConcurrentReconciles:

# Maximum number of concurrently running reconcile loops for targetGroupBinding
targetgroupbindingMaxConcurrentReconciles:

# Period at which the controller forces the repopulation of its local object stores. (default 1h0m0s)
syncPeriod:

# Namespace the controller watches for updates to Kubernetes objects, If empty, all namespaces are watched.
watchNamespace:

# Liveness probe configuration for the controller
livenessProbe:
  failureThreshold: 2
  httpGet:
    path: /healthz
    port: 61779
    scheme: HTTP
  initialDelaySeconds: 30
  timeoutSeconds: 10

# Environment variables to set for aws-load-balancer-controller pod.
# We strongly discourage programming access credentials in the controller environment. You should setup IRSA or
# comparable solutions like kube2iam, kiam etc instead.
env:
  # ENV_1: ""
  # ENV_2: ""

# Specifies if aws-load-balancer-controller should be started in hostNetwork mode.
#
# This is required if using a custom CNI where the managed control plane nodes are unable to initiate
# network connections to the pods, for example using Calico CNI plugin on EKS. This is not required or
# recommended if using the Amazon VPC CNI plugin.
hostNetwork: false

# extraVolumeMounts are the additional volume mounts. This enables setting up IRSA on non-EKS Kubernetes cluster
extraVolumeMounts:
  # - name: aws-iam-token
  #   mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
  #   readOnly: true

# extraVolumes for the extraVolumeMounts. Useful to mount a projected service account token for example.
extraVolumes:
  # - name: aws-iam-token
  #   projected:
  #     defaultMode: 420
  #     sources:
  #     - serviceAccountToken:
  #         audience: sts.amazonaws.com
  #         expirationSeconds: 86400
  #         path: token

# defaultTags are the tags to apply to all AWS resources managed by this controller
defaultTags: {}
  # default_tag1: value1
  # default_tag2: value2

podDisruptionBudget: {}
#  maxUnavailable: 1

role creation

resource "aws_iam_policy" "AWSLoadBalancerControllerIAMPolicy" {
  name        = "AWSLoadBalancerControllerIAMPolicy"
  path        = "/"
  description = "AWS Load Balancer Controller Policy"

  # Terraform's "jsonencode" function converts a
  # Terraform expression result to valid JSON syntax.
  policy = file("k8sutils/alb-controller/iam-policy.json")

  tags = {
    Terraform   = "true"
    Environment = local.workspace
  }

}

resource "aws_iam_role" "AWSLoadBalancerControllerIAMRole" {
  name = "AWSLoadBalancerControllerIAMRole"

  tags = {
    Terraform   = "true"
    Environment = local.workspace
  }

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "AWSLoadBalancerControllerRolePolicAttachment" {
  role       = aws_iam_role.AWSLoadBalancerControllerIAMRole.name
  policy_arn = aws_iam_policy.AWSLoadBalancerControllerIAMPolicy.arn
}
kishorj commented 3 years ago

@kaykhancheckpoint, could you verify that the controller pod has the following volume mount and volume configuration injected?

    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token
kaykhancheckpoint commented 3 years ago

Hey i managed to fix this, you can see i set the policy to ec2.amazonaws.com when it should have been a sts:AssumeRoleWithWebIdentity role

I was able to do this using a module in terraform iam-assumable-role-with-oidc.

locals {
  k8s_aws_lb_service_account_namespace = "kube-system"
  k8s_aws_lb_service_account_name      = "aws-load-balancer-controller"
}

resource "aws_iam_policy" "AWSLoadBalancerControllerIAMPolicy" {
  name        = "AWSLoadBalancerControllerIAMPolicy"
  path        = "/"
  description = "AWS Load Balancer Controller Policy"

  policy = file("utils/aws-lb-controller/iam-policy.json")

  tags = {
    Terraform   = "true"
    Environment = local.workspace
  }

}

module "iam_assumable_role_aws_lb" {
  source                        = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version                       = "3.6.0"
  create_role                   = true
  role_name                     = "AWSLoadBalancerControllerIAMRole"
  provider_url                  = replace(module.eks.cluster_oidc_issuer_url, "https://", "")
  role_policy_arns              = [aws_iam_policy.AWSLoadBalancerControllerIAMPolicy.arn]
  oidc_fully_qualified_subjects = ["system:serviceaccount:${local.k8s_aws_lb_service_account_namespace}:${local.k8s_aws_lb_service_account_name}"]

  tags = {
    Terraform   = "true"
    Environment = local.workspace
  }

}
zquintana commented 3 years ago

I have the exact same issue, I can't figure out what's causing it.

Pod Logs:

{"level":"error","ts":1627148976.691803,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:34703->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149158.8136048,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:52341->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149331.7705815,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:58778->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149528.279761,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:55073->172.20.0.10:53: read: connection refused"}
{"level":"error","ts":1627149707.0748882,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"hello","namespace":"default","error":"couldn't auto-discover subnets: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.us-west-2.amazonaws.com/\": dial tcp: lookup sts.us-west-2.amazonaws.com on 172.20.0.10:53: read udp 10.0.3.184:48301->172.20.0.10:53: read: connection refused"}

Container args:

Args:
      --cluster-name=app-rylqFOXa
      --ingress-class=alb
      --aws-region=us-west-2
      --aws-vpc-id=vpc-0e200d3ae7e12447c

Role policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::203341958641:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/2917B2CCF25A5DC470EF1CF5DB059AE9"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-west-2.amazonaws.com/id/2917B2CCF25A5DC470EF1CF5DB059AE9:sub": "system:serviceaccount:kube-system:aws-load-balancer-controller"
        }
      }
    }
  ]
}

The public subnets tagged with:

kubernetes.io/role/elb  1
kubernetes.io/cluster/app-rylqFOXa  shared

Private are basically the same, but with internal-elb. I'm trying to try out fargate as a POC for work. What might I be missing here?

zquintana commented 3 years ago

Turns out my issue was related to core dns as described here https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1360.

ptwohig commented 2 years ago

A little over a year later, I'm running into this issue. Why would DNS cause this? It seems to resolve just fine, but the assume role policy isn't correct.

And what exactly is the fix here? Is it the fix for #1360, or is it the fix posted by @kaykhancheckpoint.

I'm getting it without using any Fargate profiles. I've tried both (DNS config) as well as the Terraform bit and neither seem to work.

ptwohig commented 2 years ago

Disregard my other comment. Figured it out.

For anyone else whose Googling lands them here, this is a ready-made drop-in for Terraform which correctly sets up the permissions using a freely available module.

If you find yourself here after many hours of frustration, as I did, note the following:

In my case the first case was the problem.

If you want easy mode and you're using Terraform, this should drop right in:


locals {
  kube_system_namespace = "kube-system"
  alb_service_account_name = "alb-controller"
  efs_service_account_name = "efs-controller"
  system_service_accounts = [
    "${local.kube_system_namespace}:${local.alb_service_account_name}"
  ]
}

resource "kubernetes_service_account" "alb" {
  metadata {
    name = local.alb_service_account_name
    namespace = local.kube_system_namespace
    labels = {
      "app.kubernetes.io/name" = "aws-load-balancer-controller"
      "app.kubernetes.io/component" = "controller"
    }
    annotations = {
      "eks.amazonaws.com/role-arn" = module.vpc_cni_irsa.iam_role_arn
    }
  }
}

module "vpc_cni_irsa" {

  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 4.12"

  role_name_prefix      = "vpc-cni-irsa-"

  attach_load_balancer_controller_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = local.system_service_accounts
    }
  }

}
raisulislam541 commented 2 years ago

constantly getting the following output after checking the logs of aws-load-balancer-controller:

{"level":"error","ts":1657324768.868449,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-default-backends-6d61d3952a","namespace":"default","error":"WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 5c3f9ce5-ba7d-495e-98b1-ffcd5cf85133"}
{"level":"error","ts":1657324768.8749876,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-default-backends-83e7be3ef9","namespace":"default","error":"WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: bddf9d23-4126-4b61-aaf9-c1ba1ecc8ed4"}
{"level":"error","ts":1657324768.8810706,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-default-flowerse-82587b6137","namespace":"default","error":"WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: d09f8725-db5b-4d47-9443-b627d4f8a8c8"}
{"level":"error","ts":1657324781.8112953,"logger":"controller-runtime.manager.controller.ingress","msg":"Reconciler error","name":"backend-ingress","namespace":"default","error":"WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 54e2e5b2-103f-4a25-b544-7cccd739a560"}

kubectl describe ingress ingress_name shows this :

Events:
  Type     Reason            Age    From     Message
  ----     ------            ----   ----     -------
  Warning  FailedBuildModel  2m46s  ingress  Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 8d30a0d7-1c0c-4890-b78d-eca678982f86
  Warning  FailedBuildModel  2m46s  ingress  Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: aa4873bb-2c96-4491-b506-5a6011bd2a35
  Warning  FailedBuildModel  2m46s  ingress  Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 811042d9-f1e6-4131-b722-6fb62dc2439c
  Warning  FailedBuildModel  2m45s  ingress  Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 0540cd6a-910b-4cef-8d31-424d0f2de3e1
  Warning  FailedBuildModel  2m45s  ingress  Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: d6e4e2bb-56dc-4075-b597-c75f6d97547a
  Warning  FailedBuildModel  2m45s  ingress  Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: 8a9897c9-116f-4ccc-ba1f-123fa2c4e76c
  Warning  FailedBuildModel  2m45s  ingress  Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
           status code: 403, request id: b8d22508-6a88-4c03-bcf3-4200a7b34c50
  Warning  FailedBuildModel  2m45s  ingress  Failed build model due to WebIdentityErr: failed to retrieve credentials

my role policy is :

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated":
"arn:aws:iam::*****:oidc-provider/oidc.eks.ap-southeast-1.amazonaws.com/id/*********"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {

"oidc.eks.ap-southeast-1.amazonaws.com/id/*****":
"sts.amazonaws.com",

"oidc.eks.ap-southeast-1.amazonaws.com/id/*****":
"system:serviceaccount:kube-system:aws-load-balancer-controller"
                }
            }
        }
    ]
}