kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.93k stars 1.46k forks source link

Overly restricitve permissions in v2.4.3 policy AWSLoadBalancerControllerIAMPolicy #2785

Closed timharsch closed 1 year ago

timharsch commented 2 years ago

Describe the bug

When attempting to build my ingress resource with the aws-load-balancer-controller I saw the following error when describing the resource:

Warning  FailedDeployModel  54m   ingress  Failed deploy model due to UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message:  cfUle6L---REDACTED---cF9vVm9Nq6XZICy8Glpi 

Which I then decoded like so:

aws sts decode-authorization-message --encoded-message cfUle6L---REDACTED---cF9vVm9Nq6XZICy8Glpi | jq | sed 's/[\\]"/"/g'

and then copied the DecodedMessage into an editor and formatted it for reading: Once I was able to read the message I deduced that AmazonEKSLoadBalancerControllerRole was choking on a long set of ec2:CreateTags operations it was trying to perform.

I solved this by updating the AWSLoadBalancerControllerIAMPolicy and changing The overly restrictive permissions section that looks like so:

        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags"
            ],
            "Resource": "arn:aws:ec2:*:*:security-group/*",
            "Condition": {
                "StringEquals": {
                    "ec2:CreateAction": "CreateSecurityGroup"
                },
                "Null": {
                    "aws:RequestTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags",
                "ec2:DeleteTags"
            ],
            "Resource": "arn:aws:ec2:*:*:security-group/*",
            "Condition": {
                "Null": {
                    "aws:RequestTag/elbv2.k8s.aws/cluster": "true",
                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },

after I noticed those sections were somewhat duplicating each other, I simplified them to:

        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags",
                "ec2:DeleteTags"
            ],
            "Resource": "*"
        },

and waited for the next k8s reconcile loop to occur (every 15 minutes). I then got the next permissions issue in the ingress resource ingress.networking.k8s.io/alb-ingress:

Warning  FailedDeployModel  2m36s  ingress  Failed deploy model due to AccessDenied: User: arn:aws:sts::0123456789:assumed-role/AmazonEKSLoadBalancerControllerRole/1661879092752206807 is not authorized to perform: elasticloadbalancing:AddTags on resource: arn:aws:elasticloadbalancing:us-east-1:0123456789:targetgroup/ebc2b01a-42a75cf07e6b68b008e/8eac39c029d24cb2 because no identity-based policy allows the elasticloadbalancing:AddTags action:

which I solved by changing the following portion of the policy file from:

        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:SetWebAcl",
                "elasticloadbalancing:ModifyListener",
                "elasticloadbalancing:AddListenerCertificates",
                "elasticloadbalancing:RemoveListenerCertificates",
                "elasticloadbalancing:ModifyRule"
            ],
            "Resource": "*"
        }

TO:

        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:SetWebAcl",
                "elasticloadbalancing:ModifyListener",
                "elasticloadbalancing:AddListenerCertificates",
                "elasticloadbalancing:RemoveListenerCertificates",
                "elasticloadbalancing:ModifyRule",
                "elasticloadbalancing:AddTags"
            ],
            "Resource": "*"
        }

adding the necessary elasticloadbalancing:AddTags permission. and then waited the 15 minutes to see the controller set up a working load balancer.

My suggestion with the policy file would be to simplify the createTags permissions as I did, it's just tags after all and there is precedent for wildcarding other permissions in the file for operations that are more of a security concern than tagging operations.

Environment

Additional Context:

M00nF1sh commented 2 years ago

@timharsch by default the permission should be sufficient to create AWS API Objects in the LBController. Did you have any non-trival setup such as used a resource already exists such as upgrade from a previous version before v2.0.0? It would be good if you can share the CloudTrail event for request denied requests.

We want to provide the minimal permissions by default, TAGs in AWS resources are indeed a security concern as AWS supports tag based authorization.

kishorj commented 2 years ago

@timharsch, I'm closing the issue. Feel free to reach out to us if you have further concerns.

micksabox commented 2 years ago

I also experienced a similar error to @timharsch and tried the posted solution.

My setup was migrating from a previous version of the ALB controller. (v1.x) I migrated straight to v2.4.3 using the installation instruction.

The failure occurred right after this: {"level":"info","ts":1663256153.7654002,"logger":"controllers.ingress","msg":"adding resource tags","resourceID":"sg-09cbf32ace9d38570","change":{"elbv2.k8s.aws/cluster":"osd-staging"}} which if you notice is attempting to add the tag that the Condition to CreateTag checks for to be not-Null.

I had an existing load balancer I was using. I also got stuck in one additional step, I had to remove this Condition

"Condition": {
                "Null": {
                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
                }
            }

from this statement.

{
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:ModifyLoadBalancerAttributes",
                "elasticloadbalancing:SetIpAddressType",
                "elasticloadbalancing:SetSecurityGroups",
                "elasticloadbalancing:SetSubnets",
                "elasticloadbalancing:DeleteLoadBalancer",
                "elasticloadbalancing:ModifyTargetGroup",
                "elasticloadbalancing:ModifyTargetGroupAttributes",
                "elasticloadbalancing:DeleteTargetGroup"
            ],
            "Resource": "*"
        },
kishorj commented 2 years ago

@micksabox, which 1.x version were you on previously?

micksabox commented 2 years ago

@kishorj I was using the latest 1.x version, v1.1.9.

timharsch commented 2 years ago

Sorry for the delay. Here is a redacted version of the decoded message. I shortened it to the pertinent parts and removed ids to my environment. you can see the load balancer is tagging resources. I did not do a a careful comparison of the resources here to those in the conditions, but I think I can see it is attempting to tag resources not in the permissions list.

{
    "allowed": false,
    "explicitDeny": false,
    "matchedStatements":
    {
        "items":
        []
    },
    "failures":
    {
        "items":
        []
    },
    "context":
    {
        "principal":
        {
            "id": "AROA2C4-REDACTED-66374036",
            "arn": "arn:aws:sts::0123456789:assumed-role/AmazonEKSLoadBalancerControllerRole/166187REDACTED74036"
        },
        "action": "ec2:CreateTags",
        "resource": "arn:aws:ec2:us-east-1:0123456789:security-group/sg-094REDACTED0243a",
        "conditions":
        {
            "items":
            [
                {
                    "key": "ec2:Vpc",
                    "key": "0123456789:ingress.k8s.aws/cluster",
                    "key": "aws:Resource",
                    "key": "ec2:ResourceTag/kubernetes.io/ingress-name",
                    "key": "ec2:ResourceTag/kubernetes.io/cluster-name",
                    "key": "aws:Account",
                    "key": "ec2:ResourceTag/kubernetes.io/namespace",
                    "key": "ec2:ResourceTag/ingress.k8s.aws/cluster",
                    "key": "ec2:SecurityGroupID",
                    "key": "0123456789:ingress.k8s.aws/stack",
                    "key": "aws:Region",
                    "key": "aws:Service",
                    "key": "0123456789:kubernetes.io/ingress-name",
                    "key": "ec2:ResourceTag/ingress.k8s.aws/stack",
                    "key": "aws:Type",
                    "key": "ec2:Region",
                    "key": "ec2:ResourceTag/ingress.k8s.aws/resource",
                    "key": "0123456789:kubernetes.io/namespace",
                    "key": "aws:ARN",
                    "key": "0123456789:ingress.k8s.aws/resource",
                    "key": "0123456789:kubernetes.io/cluster-name",

I was doing a from scratch build, not an upgrade.. I build my clusters using the following eksctl template:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: CLUSTER_NAME
  region: REGION

vpc:
  id: "VPCID"
  cidr: "VPC_CIDR"
  subnets:
    public:
      AZA:
        id: "SUBA_ID"
        cidr: "SUBA_CIDR"
      AZB:
        id: "SUBB_ID"
        cidr: "SUBB_CIDR"

nodeGroups:
  - name: ng-1
    instanceType: INSTANCE_TYPE
    desiredCapacity: 2
    ssh: # use existing EC2 key
      publicKeyName: KEYNAME
timharsch commented 2 years ago

@micksabox can you reopen this issue? Or should I file another?

micksabox commented 2 years ago

@micksabox can you reopen this issue? Or should I file another?

I'm not able to re-open, maybe you meant @kishorj

kishorj commented 2 years ago

/reopen

k8s-ci-robot commented 2 years ago

@kishorj: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2785#issuecomment-1255386888): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

timharsch commented 1 year ago

As far as I know, this is still an issue and should remain open until it can be addressed

agconti commented 1 year ago

I'm encountering the same issue as @timharsch. Like him, I'm doing a fresh install on a new cluster.

My error message:

{
  "allowed": false,
  "explicitDeny": false,
  "matchedStatements": {
    "items": []
  },
  "failures": {
    "items": []
  },
  "context": {
    "principal": {
      "id": "AROA2WTHL5ZLXXGJ4XOEO:REDACTED:security",
      "arn": "arn:aws:sts::REDACTED:security:assumed-role/load-balancer-controller-qa/REDACTED:security"
    },
    "action": "ec2:CreateTags",
    "resource": "arn:aws:ec2:us-east-1:REDACTED:security-group/sg-0908332267840f3de",

   "//": "More omitted",
}

My terraform:

resource "helm_release" "aws_load_balancer_controller" {
  depends_on = [
    var.deployment_dependency,
  ]
  name       = "aws-load-balancer-controller"
  namespace  = "kube-system"
  chart      = "aws-load-balancer-controller"
  version          = "1.4.6"
  repository       = "https://aws.github.io/eks-charts"
  create_namespace = false

  set {
    name  = "clusterName"
    value = module.config.cluster_name
  }

  set {
    name  = "serviceAccount.create"
    value = true
  }

  set {
    name  = "serviceAccount.name"
    value = local.load_balancer_controller_irsa_service_account_name
  }

  set {
    name = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = module.load_balancer_controller_irsa_role.iam_role_arn
  }
}

resource "kubernetes_ingress_v1" "main" {
  depends_on = [
    var.deployment_dependency,
    helm_release.aws_load_balancer_controller,
    module.load_balancer_controller_irsa_role
  ]
  wait_for_load_balancer = true

  metadata {
    name = "main"
    annotations = {
      "kubernetes.io/ingress.class"                                  = "alb"
      "alb.ingress.kubernetes.io/scheme"                             = "internet-facing"
      "alb.ingress.kubernetes.io/target-type"                        = "ip"
      "alb.ingress.kubernetes.io/tags"                               = "Environment=${var.environment}"
      "alb.ingress.kubernetes.io/certificate-arn"                    = join(",", var.ssl_cert_arns)
      "external-dns.alpha.kubernetes.io/hostname"                    = "www.${module.config.tech_domain_name}"
      "alb.ingress.kubernetes.io/listen-ports"                       = jsonencode([{ HTTP = 80 }, { HTTPS = 443 }])
      "alb.ingress.kubernetes.io/actions.ssl-redirect"               = "443"
      "alb.ingress.kubernetes.io/load-balancer-attributes"           = "idle_timeout.timeout_seconds=4000,routing.http2.enabled=true"
    }
  }

  spec {
    dynamic "rule" {
      for_each = toset(var.ingress_services)
      content {
        host = "${rule.value}.${module.config.tech_domain_name}"
        http {
          path {
            path = "/*"
            backend {
              service {
                name = rule.value
                port {
                  number = 80
                }
              }
            }
          }
        }
      }
    }
  }
}

module "load_balancer_controller_irsa_role" {
  source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.9.2"
  role_name                              = local.load_balancer_controller_irsa_role_name
  attach_load_balancer_controller_policy = true

  oidc_providers = {
    main = {
      provider_arn               = var.oidc_provider_arn
      namespace_service_accounts = [
        "kube-system:${local.load_balancer_controller_irsa_service_account_name}",
      ]
    }
  }
}

module "eks" {
  source          = "terraform-aws-modules/eks/aws"
  version         = "~> 19.0"
  cluster_version = local.cluster_version
  cluster_name    = var.cluster_name
  subnet_ids      = var.private_subnets
  vpc_id          = var.vpc_id
  enable_irsa     = true
  tags            = local.tags
  cluster_endpoint_public_access = true
  cluster_endpoint_private_access = false
  node_security_group_enable_recommended_rules = true # <-- implements the correct 9443 sg

  # More omitted

}

Opening up the permissions like @timharsch suggested solved my issue. For others encountering this, here's how I did that.

# Temporary fix until this is solved: https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2785
resource "aws_iam_policy" "aws_load_balancer_controller_temp_policy" {
  name        = "aws_load_balancer_controller_temp_policy"
  description = "Reduces overly restrictive policy the controller can operate effectively"

  policy = <<EOF
{
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags",
                "ec2:DeleteTags"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:SetWebAcl",
                "elasticloadbalancing:ModifyListener",
                "elasticloadbalancing:AddListenerCertificates",
                "elasticloadbalancing:RemoveListenerCertificates",
                "elasticloadbalancing:ModifyRule",
                "elasticloadbalancing:AddTags"
            ],
            "Resource": "*"
        }
    ],
    "Version": "2012-10-17"
}
EOF
}

resource "aws_iam_role_policy_attachment" "aws_load_balancer_controller_temp_policy" {
  role       = module.load_balancer_controller_irsa_role.iam_role_name
  policy_arn = aws_iam_policy.aws_load_balancer_controller_temp_policy.arn
}

/remove-lifecycle stale

kishorj commented 1 year ago

@agconti, @timharsch could you check whether your security groups have the following tags?

ingress.k8s.aws/resource: ManagedLBSecurityGroup
elbv2.k8s.aws/cluster: <cluster_name>
ingress.k8s.aws/stack: <namespace/name>

If these tags are not present on the security groups, then the SG is not created by the v2 release of this controller. The cluster tag gets added during the sg creation, and the reference IAM policy allows tagging operations on the concerned security groups.

If you had resources created by the v1 version of the controller, you need to grant additional IAM permissions mentioned in the upgrade instructions (https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/install/iam_policy_v1_to_v2_additional.json). The v1 version of the controller uses the ingress.k8s.aws/cluster tag while v2 version uses elbv2.k8s.aws/cluster - hence the additional permission.

agconti commented 1 year ago

@kishorj Yes, my security group doesn't have those tags. I'm deleting the old v1 controller and creating the v2 controller from scratch so I'm not sure why it would be missing them. I'll try adding the additional iam perms like you suggested.

agconti commented 1 year ago

@kishorj Thanks for your help! I tried adding the additional perms you linked. I'm able to create the ingress now, and it has the correct security group tags:

ingress.k8s.aws/resource: ManagedLBSecurityGroup
elbv2.k8s.aws/cluster: <cluster_name>
ingress.k8s.aws/stack: <namespace/name>

However, I'm still running into iam permissions issues, specifically with elasticloadbalancing:SetSecurityGroups:

Failed deploy model due to AccessDenied: User: arn:aws:sts::REDACTED:assumed-role/load-balancer-controller-qa/REDACTED is not authorized to perform: elasticloadbalancing:SetSecurityGroups on resource: arn:aws:elasticloadbalancing:us-east-1:REDACTED:loadbalancer/app/9ec9d36b-default-main-ebd4/80b73ad69b0b69e7 because no identity-based policy allows the elasticloadbalancing:SetSecurityGroups action.

I'm surprised this is happening, given that my SG now has the tags needed by the iam policy to use elasticloadbalancing:SetSecurityGroups. Is this a consequence of the Null condition on the policy? i.e. when the role is first assumed by the controller, the tags are missing on the security groups until it adds them. So the condition resolves that the role does not have permission to modify the security group. If so, my guess would be that re-assuming the role would solve this issue. I tried this by restarting the deployment but, but the pods go stuck in the terminating state.

This all seems to stem from an initially incorrectly tagged SG, so I tried adding the expected sg description with the suggestion from the migration instructions from v1 to v2 before I delete the v1 controller and create the v2 controller from scratch.

aws --region $REGION ec2 update-security-group-rule-descriptions-ingress --cli-input-json "$(aws --region $REGION ec2 describe-security-groups --group-ids $SG_ID | jq '.SecurityGroups[0] | {DryRun: false, GroupId: .GroupId ,IpPermissions: (.IpPermissions | map(select(.FromPort==0 and .ToPort==65535) | .UserIdGroupPairs |= map(.Description="elbv2.k8s.aws/targetGroupBinding=shared"))) }' -M)"

But the command fails with:

An error occurred (MissingParameter) when calling the UpdateSecurityGroupRuleDescriptionsIngress operation: Either 'ipPermissions' or 'securityGroupRuleDescriptions' should be provided.

For reference, the output of the subcommand does contain 'ipPermissions' its just an empty array:

aws --region $REGION ec2 describe-security-groups --group-ids $SG_ID | jq '.SecurityGroups[0] | {DryRun: false, GroupId: .GroupId ,IpPermissions: (.IpPermissions | map(select(.FromPort==0 and .ToPort==65535) | .UserIdGroupPairs |= map(.Description="elbv2.k8s.aws/targetGroupBinding=shared"))) }' 
{
  "DryRun": false,
  "GroupId": "sg-0073604c5e64ea780",
  "IpPermissions": []
}
kishorj commented 1 year ago

@agconti, the aws cli is to update existing SG ingress on your EC2 SG added by the v1 controller. If the permissions list is empty, either the sg is not the one attached to your EC2 instance, or it doesn't not contain rules added by the v1 version of the controller.

The SetSecurityGroups errors imply the underlying AWS ALB resource doesn't have the expected tags. If you use the reference IAM policy, the ALB must have the tag elbv2.k8s.aws/cluster.

If you used the v1 controller, you must be on v1.1.3 or later before upgrading to v2 controller. If not, the AWS resource tags will not be updated accordingly.

In your case, you either need to update the tags on the underlying AWS resources OR provide controller permissions to access your existing resources.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

timharsch commented 1 year ago

Just a quick note to say that I recently upgraded the controller in our environment to v2.5.1. I didn't see any changes to the IAM policy that would address the first problem I described, but I did not have the problem again after the upgrade. I noticed that the v2.5.1 policy file did contain fixes that addressed the second problem I described.

I think it is safe to close this ticket.