zalando-incubator / kube-ingress-aws-controller

Configures AWS Load Balancers according to Kubernetes Ingress resources
MIT License
375 stars 83 forks source link

elasticloadbalancing:DescribeLoadBalancerAttributes missing in IAM policy #632

Open spr-mweber3 opened 1 year ago

spr-mweber3 commented 1 year ago

Hey,

referring to the documented pre-requirements regarding the IAM policy. It seems there was a change somewhen somewhere. I was not able to find out where exactly. Matter of fact is, we didn't do any update to the controller recently. So I think that can be ruled out.

It now seems to be required that in the policy for the controller elasticloadbalancing:DescribeLoadBalancerAttributes permission is required.

Otherwise the controller will not be able to succeed anymore with provisioning the resources through CloudFormation.

Unable to retrieve DNSName attribute for AWS::ElasticLoadBalancingV2::LoadBalancer, with error message User: arn:aws:sts::account_id:assumed-role/foo/1692193986540068745 is not authorized to perform: elasticloadbalancing:DescribeLoadBalancerAttributes because no identity-based policy allows the elasticloadbalancing:DescribeLoadBalancerAttributes action (Service: ElasticLoadBalancingV2, Status Code: 403, Request ID: e65ecdc0-2c7d-4df2-8a62-b33b8ac3cddf). Delete requested by user.

mikkeloscar commented 1 year ago

@spr-mweber3 Can you share the ingress resources that triggers a CF stack where this happens?

I can't find we set this permission anywhere in our production system and we didn't see this problem you describe. I wonder if you use some of the more special annotations from the controller?

spr-mweber3 commented 1 year ago

There is nothing special in regards to annotations on that ingresses. In fact, we're applying the same ingresses with the same annotations in new clusters.

We're using the same v0.14.24 of the controller everywhere.

The additional permission doesn't seem to be needed for already deployed load balancers. We have a lot of them managed by the controller and don't see any issue but if you try to create a new one, you'll be able to see that something changed.

Even more, somehow the naming schema of the created load balancers changed. Earlier, the load balancers were named kube-ing-LB-Z0LLWRJNY0IS but new ones look diifferent, e.g. LB-9MiEZ2OqtiIy. I think you'll spot the differences. Lower case characters, kube-ing missing at the start.

I tried to figure out what changed, but didn't succeed. To me it looks like it's something at AWS on their API, which maybe causes different behavior inside the controller as a consequence.

szuecs commented 1 year ago

What? Are you sure that you run the same image? Do you have a change in your AWS iam?

spr-mweber3 commented 1 year ago

Yep, it's true. I'm not kidding. It's the same image. We're using v0.14.24 everywhere. I just tried again to force create a new CF stack with zalando.org/aws-load-balancer-shared: "false" on a new ingress. It succeeds only if the missing permission is added to the IAM role and the name of the LB is not consistent with what it was before.

I mean, did you try it yourself as well?

I'm not aware of any changes on our side which would be able to cause that. We're provisioning and deprovision identical setups and it just started recently that we monitor the issues described here.

Can that be somehow related to changes at AWS to support SGs on NLBs? Or changes in the API at CloudFormation? Because in fact it's not the controller itself who creates the resources, it's the CloudFormation stack which now seems to do stuff differently than before.

mikkeloscar commented 1 year ago

@spr-mweber3 can you share the ingress you use, then we can try with the same (ofc. hide your internal details like hostnames and so).

spr-mweber3 commented 1 year ago

Sure thing.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    zalando.org/aws-load-balancer-shared: "false"
  name: a-name
  namespace: default
spec:
  rules:
  - host: a-name.foo.bar
    http:
      paths:
      - backend:
          service:
            name: a-service
            port:
              number: 8080
        path: /
        pathType: Prefix
mikkeloscar commented 1 year ago

Ok, that is very basic

szuecs commented 1 year ago

@spr-mweber3 If you diff the CF stack of a "good" cluster and the "bad" cluster, do they differ? I guess we need to ask AWS support, because I have no idea how this can happen. Maybe it's something new they internally do for new CF stacks?

I tried it with our controller and version v0.14.30 and everything works fine without the permission. v0.14.24 has a different version of aws SDK , but I don't see anything in https://github.com/aws/aws-sdk-go/compare/v1.44.273...v1.44.294 because the information is not much they provide in the listing. Maybe try the latest version?