🐛 Fix creation of target groups and listeners in the reconcile loop

kubernetes-sigs / cluster-api-provider-aws

Kubernetes Cluster API Provider AWS provides consistent deployment and day 2 operations of "self-managed" and EKS Kubernetes clusters on AWS.

http://cluster-api-aws.sigs.k8s.io/

Apache License 2.0

626 stars 542 forks source link

🐛 Fix creation of target groups and listeners in the reconcile loop #5017

Closed r4f4 closed 2 weeks ago

r4f4 commented 2 weeks ago

What type of PR is this? /kind bug

What this PR does / why we need it:

Fixes an error where capa will create duplicate target groups and listeners.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #5015

Special notes for your reviewer:

Checklist:

[ ] squashed commits
[ ] includes documentation
[X] includes emojis
[ ] adds unit tests
[ ] adds or updates e2e tests

Release note:

Fixes target group and listeners creation for v2 Load Balancers.

k8s-ci-robot commented 2 weeks ago

Hi @r4f4. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

nrb commented 2 weeks ago

/ok-to-test

r4f4 commented 2 weeks ago

I'm currently running the Openshift e2e on this fix. Will report back when I have results.

nrb commented 2 weeks ago

/test pull-cluster-api-provider-aws-e2e /test pull-cluster-api-provider-aws-e2e-blocking /test pull-cluster-api-provider-aws-e2e-clusterclass

nrb commented 2 weeks ago

/retest

patrickdillon commented 2 weeks ago

LGTM

k8s-ci-robot commented 2 weeks ago

@mtulio: changing LGTM is restricted to collaborators

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5017#pullrequestreview-2114316716): >/lgtm Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

r4f4 commented 2 weeks ago

All the openshift tests in this PR, except for one, reached a running cluster stage.

In the job that failed, no duplicated target groups/listeners are created so the issue seems to be fixed.

nrb commented 2 weeks ago

/retest

nrb commented 2 weeks ago

/approve /hold

Holding until it passes e2e. Failures appear to be unrelated to this change, though.

/assign @damdo for second review.

k8s-ci-robot commented 2 weeks ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nrb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/OWNERS)~~ [nrb] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

damdo commented 2 weeks ago

/test pull-cluster-api-provider-aws-e2e

r4f4 commented 2 weeks ago

Matching the groups based on naming sounds like it could become a little fragile over time, for example if we ever decided to change the names.

Yes. I tried to move the used prefixes to consts to help with that.

Is there any reason not to go straight to checking the port and the type rather than relying on names?

I was discussing this with @mtulio. Ideally we would check the owned tag but I'm reluctant about adding even more API calls (one of the failures reasons we observed was API rate limitting). If it's acceptable that we might match target groups that were not created by CAPA, then we can check just port/type.

What would happen if a manually created target group matched the spec but wasn't named correctly, would the cluster still function?

Then CAPA would create a new target group. As long as an associated listener doesn't exist for that port, it should work. If there is no listener, then the target group was not associated with the LB and CAPA wouldn't have discovered it in the first place. I haven't tested this though.

damdo commented 2 weeks ago

/test pull-cluster-api-provider-aws-e2e

JoelSpeed commented 2 weeks ago

Having discussed out of band, the current approach is better than what we have in main today, which is broken and has potential for leaking resources.

I would like to see some more testing of the scenarios where users bring their own target groups or start modifying the target groups after the LB is created, but that needn't block this PR

r4f4 commented 2 weeks ago

/hold cancel e2e passed.