Allow multiple controller deployment per cluster

M00nF1sh commented 2 years ago

Is your feature request related to a problem? Currently We only support a single controller deployment per cluster if worker node SG rules is managed by the controller. Since the controller assumes it's the solo owner of worker node security group rules with elbv2.k8s.aws/targetGroupBinding=shared description, running multiple controller deployment will cause these controllers compete with each other updating worker node security group rules.

See also:

Describe the solution you'd like We could leverage the new tagging support for security groups, and add a flag(e.g. --controller-instance=instance-a) to identify the controller deployment. So that each controller will do following

when a worker node SG rule is needed: a. if the rule don't exists, and tag the rule with controller-specific tags like "elbv2.k8s.aws/targetGroupBinding/instance-a": "shared" b. if the rule already exists, add additional tags to the rule like "elbv2.k8s.aws/targetGroupBinding/instance-a": "shared"
when a worker node SG rule is not needed: a. if the rule already exists and there are multiple tags with prefix "elbv2.k8s.aws/targetGroupBinding", remove the controller's specific tag. e.g. "elbv2.k8s.aws/targetGroupBinding/instance-b" b. if the rule already exists and there is only one tag with prefix "elbv2.k8s.aws/targetGroupBinding, and this tag matches the controller's one, remove the rule

Describe alternatives you've considered A description of any alternative solutions or features you've considered.

M00nF1sh commented 2 years ago

/kind feature

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ghost commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

FernandoMiguel commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ghost commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ghost commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

visit1985 commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2185#issuecomment-1523812325): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

visit1985 commented 1 year ago

This issue is mentioned in the docs as being in the roadmap /reopen

k8s-ci-robot commented 1 year ago

@visit1985: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2185#issuecomment-1525850419): >This issue is mentioned in the docs as being in the roadmap >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

dlmather commented 1 year ago

I ran up into some of the issues this feature request would address because I'm using aws-loadbalancer-controller across multiple EKS clusters that are within the same VPC. To keep things otherwise simple, we re-use the service security groups that the ALBs need to create an ingress rule for - this led to the contention issue across controllers in separate clusters. I was able to address in a fork because there is no potential contention on the rules themselves, they will always come from different ALBs, so we can just use the cluster-name in the permission selector as a way to take control of the rules. See https://github.com/dlmather/aws-load-balancer-controller/commit/58b5ff356086b4920d6e3e2f5d705bf29cb3c27f

This however doesn't address the entirety of the feature request which is more open-ended and would support multiple LB controllers owning the same rule (which would require read/update functionality over the rule descriptions).

@M00nF1sh would you be open to a PR that only solves this more limited use-case of avoiding contention on multiple clusters (with non-overlapping rules)? Is there any other action around addressing this issue otherwise? In spite of being closed, I can confirm this is still very much a problem for my use-cases.

acjohnson commented 9 months ago

/reopen

k8s-ci-robot commented 9 months ago

@acjohnson: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2185#issuecomment-1716773679): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

acjohnson commented 9 months ago

This is still definitely a much needed feature +1

farazsiddiqui2010 commented 9 months ago

This is much needed feature for us. in our case, we have 2 AWS load balancer controller, which are using for separate AWS accounts. Controller A is using for AWS account A Controller B is using for AWS account B.

Is there any possibility to create multiple ingress classes in one aws ingress controller?

acjohnson commented 9 months ago

@farazsiddiqui2010 you definitely can create multiple ingress classes, we've done it and use it, but that won't solve the problem completely.

Once running, the secondary aws-load-balancer-controller deployment will create MutatingWebhookConfiguration and ValidatingWebhookConfiguration objects that will cause the validation of TargetGroupBindings to fail. See my comment in the following adjacent issue: https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2233#issuecomment-1716768905

Also, even if you do get both ingress class's installed, you'll still have to deal with Reconciler error's that will come from both aws-load-balancer-controller deployments, as they will both try to reconcile targetgroups that are invalid for account A and B... This is mostly cosmetic, and likely there is an easy workaround if you look. That said it is totally doable and should actually work if you patch the deployment and leader-election role for your secondary aws-load-balancer-controller deployment.

It just would be nice if the official package and even the helm chart supported this type of installation.

acjohnson commented 9 months ago

I think the targetgroupbinding issue could be solved if the controller supported some sort of required annotation or label on the targetgroupbinding objects. That way it could easily determine which targetgroupbinding's go with which aws-load-balancer-controller installation

ForbiddenEra commented 9 months ago

I just want to remind/point out this issue is referenced on the documentation as "We will remove this limitation in future versions"...

So, I follow the link, hoping to see the issue is closed/solved/being worked on at least, feeling a hint of positivity when I see 'Closed' at the top..

But then when scrolling down for more details to see how it's been addressed or what should be done to implement this use case I see that this almost seems to have been forgotten with no direction for anyone coming from the road map @visit1985 mentioned or the docs I've pointed out that state it "will" happen.

If this feature is something that is going to require external contributions as the bots response seemingly implies, then perhaps it should be stated somewhere as well - if the docs said, "We want to/have plans to remove this limitation in future versions and are seeking/open to contribution/PRs" then those who need it might be more willing to commit.

As the language is now, I think the common response would be to wait and hope it's worked out sooner rather than later, whereas if the language was more along the lines of "we want to but lack resources and can't promise when it might be available or even if, but we could use a hand" then some might be more willing to spend the time on making a PR; the language in the docs seems to imply it's being worked on, so especially no one externally will want to volunteer their time toward making this happen if they think someone's already on the job.

The docs say "The LBC is supported by AWS." and "The AWS Load Balancer Controller should be installed instead." implying it's "the proper way to use EKS" but then the implication I get from the bot when it says "The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs." - says that "we're relying on the community to support one of our paid offerings"..

And, reading that again now, seems a bit confusing and seems like a bit of an oxymoron/catch-22? You need contributors to make PRs but you don't have enough active contributors to respond to PRs?!

As an AWS customer who uses EKS quite thoroughly, I sort of consider this controller as a bit of an integral part of a proper deployment; I was a bit surprised that this isn't also an AWS-maintained EKS add-on. I see this as no less integral than something like the aws-ebs-csi-controller - it's even listed in the AWS EKS docs under 'Add-ons' but isn't actually an "Add-on" in the way that term is mostly used otherwise for EKS and it feels like the exact same kind of 'core service' as the aws-ebs-csi-contrller but for ALB/NLBs. Sure, one can get some stuff working without either in some cases, but the docs even say "The LBC is supported by AWS." and "The AWS Load Balancer Controller should be installed instead." [of using the built-in controller]

I don't think any of us has any expectations here about features/fixes being completed in any respect for anything we use, especially anything that's an open source project, even if it is corporately owned/supported (other than maybe the expectations the maintainers set themselves in a roadmap/changelog/docs); I think we all just want some clear direction on what the plan is so that we can decide whether we need to wait, make some sort of work around or even fix it ourselves and make a PR.

I was just a bit surprised to see language like "We will" do something in a big warning box accompanied by a link to a tracking issue that seemingly advertises some sort of accountability related to that statement, reading the issue and finding out it's also listed on a roadmap and that others are also waiting/curious/don't know what's going on and then seeing that the issue has been otherwise collecting dust - for over two years.

I think the targetgroupbinding issue could be solved if the controller supported some sort of required annotation or label on the targetgroupbinding objects. That way it could easily determine which targetgroupbinding's go with which aws-load-balancer-controller installation

This sounds like a very reasonable solution, and/or some sort of namespace limiting - put the controller and it's resources in one namespace and tell it another namespace to manage (eg. dev-controllers and dev-apps for one lbc and prod-controllers and prod-apps for another or something).

flaviomoringa commented 8 months ago

Hi,

we have EXACTLY this need, having and internal-lb and a external-lb controllers running, and then allow our users to choose the correct ingress class pointing to the correct LB.

We are using nginx-ingress controller and where evaluating moving to aws controller, which we thought would be really easy when we found this issue :-(

This is a total deal-breaker for us, and we will not be able to move forward to replace nginx-controller due to this.

Even worst is that on the oficial documentation it's mentioned AWS is working on this: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.6/deploy/configurations/ (see limitation warning last line on top)

But it then points to this issue that is closed.

How is this closed if it's not resolved? It should be a priority making this work, it makes no sense not being able to have multiple LB's configuration running.

Hope we get news about this soon.

This is also being mentioned on: https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2233

flaviomoringa commented 8 months ago

I think the targetgroupbinding issue could be solved if the controller supported some sort of required annotation or label on the targetgroupbinding objects. That way it could easily determine which targetgroupbinding's go with which aws-load-balancer-controller installation

This seems a simple and nice approach

ForbiddenEra commented 8 months ago

we have EXACTLY this need, having and internal-lb and a external-lb controllers running, and then allow our users to choose the correct ingress class pointing to the correct LB.

I'm able to do this with only one lbc..?

Using two ingress-nginx with each using their own ingressClass and controller with different annotations for each LoadBalancer, specifying external or internal?

eg. values.yaml for ingress-nginx helm chart something like this..

controller:
  service:
    type: LoadBalancer
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-name: nginx-ext-ingress
      service.beta.kubernetes.io/aws-load-balancer-type: external
      service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
      service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
      service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: http
      service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: /healthz
      service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: 10254

and another for internal (also have to add config to set each to use a specific ingressclass or monitor a specific namespace or something)

Then users just set their ingressClass to internal or external or however you've defined them..?

But I still think this needs to be re-opened and fixed as I'm sure there are cases where it's needed or even just easier; you can only seemingly specify one namespace to monitor for an LBC or all namespaces, so even w/annotations that can be a pain probably.

flaviomoringa commented 8 months ago

@ForbiddenEra sure you can.

we use Argo cd, and install 2 times the nginx-controller helmchart with different names, each with its own configuration and class names (main config difference is one controller creating an internal LB the other creating and external LB)

This makes us, admins, manage the controllers configuration, and exposing 2 classes/LBs to our developers to be able to choose how to expose their applications.

Using just one controller I don’t see how you can do it so easily or at all, this way each controller config is totally isolated and easy to change.

Using AWS controller although you can actually install 2 controllers the same way in the end they don’t work because both will try to use their webooks to make the health check work and enter a competition that simply makes it not working… a clear design flaw.

ForbiddenEra commented 8 months ago

@ForbiddenEra sure you can.

we use Argo cd, and install 2 times the nginx-controller helmchart with different names, each with its own configuration and class names (main config difference is one controller creating an internal LB the other creating and external LB)

sounds insanely similar to my setup, argo included :)

Using AWS controller although you can actually install 2 controllers the same way in the end they don’t work because both will try to use their webooks to make the health check work and enter a competition that simply makes it not working… a clear design flaw.

Yeah, like I said regardless of whether some desired setups can be worked around, this should still just be a thing for easy configuration. Any controller-type-thing should be able to be isolated by namespace or otherwise and run several times, nothing should have to take over a whole cluster.

Plus, docs said they're working on it for sure but here we are in a closed issue thread with no response from anyone from AWS or even a maintainer in months, with the bot saying "we don't have enough help" yet, even if someone here had a PR, it'd be just as ignored it seems :'(

AAAAAAAAAND we all pay to use this stuff, with AWS docs calling this controller the "proper way" even though it's open-source, as I said it should really be an AWS managed add-on like coredns/vpc-cni/aws-ebs-controller etc is for EKS..

dims commented 8 months ago

/reopen

k8s-ci-robot commented 8 months ago

@dims: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2185#issuecomment-1763256584): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

dims commented 8 months ago

/lifecycle frozen

ForbiddenEra commented 8 months ago

/lifecycle frozen

Not sure exactly what lifecycle frozen means but happy to at least see the issue re-opened and some attention given!

I hope none of my comments have come off harsh in any way; ASD here and no negative intentions and do appreciate hard work that has been done already. At least myself though and others I'm sure, generally don't have expectations beyond what's on roadmap/documented etc - and also like I said, the lack of clarity regarding this issue and the statements elsewhere regarding it like on docs likely deters potential contributors from working on it since they don't know if someone else is working on it (duplicated work) and/or whether there's any potential for a PR to be accepted (since little interaction from maintainers + if a AWS dev/core maintainer/etc submitted a PR at the same time as a contributor, I think in more cases it would be more likely that a core team members work would be merged rather than somerandomguyonthenet), etc - and again at least for myself personally, just the note in the docs alone would be enough for me to not bother spending time on any fixes since it implies it's already being worked on or at least on someones TODO list.

Re-opening is a step in the right direction at least! Cheers!

PS: If no one else has begun work on this task and it is open for community PR, please let us know, I would myself even consider trying to make this work depending on my free time and how hard it would be to dev/test properly.

dims commented 8 months ago

@ForbiddenEra this is a kubernetes community repository, so all contributions from everyone is welcome.

ForbiddenEra commented 5 months ago

@ForbiddenEra this is a kubernetes community repository, so all contributions from everyone is welcome.

Sure; that may be true. But your response comes across as disingenuous (though, tbf, I'm not sure if you're with AWS or not), as I pointed out above, it's listed as "on the roadmap" (for several years) and documented as the "proper" way to make use of a service I pay dearly for and, while not directly said, that sort of response and this entire issue thread comes across as "fix it yourself", when again, it's listed as "on the roadmap" which says to me it's been on the todo list of an AWS developer for several years with no movement.

While I appreciate when companies have open repos and accept contributions, it's another story when they do so and then instead of just being open to contributions, they push the work off to the community, expecting devs to do it for free - in this case, to improve a service that we're paying for.

This wouldn't be so egregious if this was truly just a 3rd party value add and something provider-agnostic, such as externaldns or similar, however here, the AWS docs themselves state "this is the way".

I admit I shouldn't be surprised with AWS' history with abusing the free software community, but yeah, as a paying customer who's following the official docs, this isn't a great look.

flaviomoringa commented 4 months ago

I cannot even say how much @ForbiddenEra reply resonates on me.... it's totally to the point, and AWS (being the one who wrote the other comment or not) should be ashamed for treating customers like that.

ForbiddenEra commented 4 months ago

I cannot even say how much @ForbiddenEra reply resonates on me.... it's totally to the point, and AWS (being the one who wrote the other comment or not) should be ashamed for treating customers like that.

Glad to hear that; I was a bit worried about coming off harsh at all; I definitely get open source but, yeah.

yuvraj9 commented 3 weeks ago

We are also blocked on this and our AWS controller takes around 40 mins to reconcile which is a major issue for us.

dims commented 3 weeks ago

I cannot even say how much @ForbiddenEra reply resonates on me.... it's totally to the point, and AWS (being the one who wrote the other comment or not) should be ashamed for treating customers like that.

@flaviomoringa please use the escalation paths you have as a AWS customer if you are not happy. thanks.

kubernetes-sigs / aws-load-balancer-controller

Allow multiple controller deployment per cluster #2185