kubernetes-sigs / cluster-api-provider-aws

Kubernetes Cluster API Provider AWS provides consistent deployment and day 2 operations of "self-managed" and EKS Kubernetes clusters on AWS.
http://cluster-api-aws.sigs.k8s.io/
Apache License 2.0
636 stars 561 forks source link

🐛: securitygroup: fix comparison of ingress rules sets. #5024

Closed r4f4 closed 2 months ago

r4f4 commented 3 months ago

What type of PR is this? /kind bug

What this PR does / why we need it:

We are comparing sets that are not compatible, so rules will always be revoked and authorized during reconciliation. We need to expand the rules from the spec so there is one rule for each item in cidrBlock/sourceSecurityGroupIds.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #5023

Special notes for your reviewer:

Checklist:

Release note:

Fix comparison of ingress rules during Security Group reconciliation to avoid unnecessary revokes and authorizes.
k8s-ci-robot commented 3 months ago

Hi @r4f4. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
r4f4 commented 3 months ago

Without this change, I'd see a constant stream of revoke-authorize ingress rules in the debug logs [1]. Now I see the rules authorized once and no revokes (will add logs once I have more signal from openshift CI).

[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/8545/pull-ci-openshift-installer-master-e2e-aws-ovn/1801267394513997824/artifacts/e2e-aws-ovn/ipi-install-install/artifacts/.openshift_install-1718293235.log

r4f4 commented 3 months ago

Do you think we are covered in the ReconcileSecurityGroups or would be hard to reproduce in new case as revoke is in the same function, and/or would be nice to create a test dedicated to exercise expandIngressRules()?

Good call. I can, at a minimum, add unit tests for the expandIngressRules() function. Will do that while I wait for CI results.

r4f4 commented 3 months ago

Updated: add a few fixes and unit tests.

r4f4 commented 3 months ago

Openshift CI results are all green! https://github.com/openshift/installer/pull/8596. Looking at the logs, the only rule being revoked is the one we explicitely removed from the cluster spec:

time="2024-06-13T21:27:14Z" level=debug msg="I0613 21:27:14.705844     356 securitygroups.go:181] \"Revoked ingress rules from security group\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" namespace=\"openshift-cluster-api-guests\" name=\"ci-op-x5n2dn57-2d061-fxccx\" reconcileID=\"1eebbaa2-4a91-478f-91df-6d0cf652d71c\" cluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" revoked-ingress-rules=[{\"description\":\"Bootstrap SSH Access\",\"protocol\":\"tcp\",\"fromPort\":22,\"toPort\":22,\"cidrBlocks\":[\"0.0.0.0/0\"]}] security-group-id=\"sg-0193bb8269722f9e3\""
time="2024-06-13T21:27:14Z" level=debug msg="I0613 21:27:14.705926     356 securitygroups.go:150] \"second pass security group reconciliation\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" namespace=\"openshift-cluster-api-guests\" reconcileID=\"1eebbaa2-4a91-478f-91df-6d0cf652d71c\" cluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" group-id=\"sg-06d4855d6bc549b25\" name=\"ci-op-x5n2dn57-2d061-fxccx-lb\" role=\"lb\""

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/8596/pull-ci-openshift-installer-master-altinfra-e2e-aws-ovn/1801355744365252608/artifacts/e2e-aws-ovn/ipi-install-install/artifacts/.openshift_install-1718314761.log

r4f4 commented 3 months ago

@damdo would you mind enabling tests on this PR?

r4f4 commented 3 months ago

/cc @alexander-demicev I see you've recently changed this code. WDYT of this fix?

r4f4 commented 3 months ago

@mtulio following your suggestion, I've added a unit test to check that we are not revoking ingress rules unnecessarily.

mtulio commented 3 months ago

/lgtm

k8s-ci-robot commented 3 months ago

@mtulio: changing LGTM is restricted to collaborators

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5024#issuecomment-2177118990): >/lgtm Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
nrb commented 3 months ago

/ok-to-test

r4f4 commented 3 months ago

Tests are all green here as well!

r4f4 commented 3 months ago

/test ?

k8s-ci-robot commented 3 months ago

@r4f4: The following commands are available to trigger required jobs:

The following commands are available to trigger optional jobs:

Use /test all to run the following jobs that were automatically triggered:

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5024#issuecomment-2181152453): >/test ? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
r4f4 commented 3 months ago

/test pull-cluster-api-provider-aws-e2e pull-cluster-api-provider-aws-e2e-blocking

r4f4 commented 3 months ago

aws-e2e failure doesn't seem to be cause by the PR changes:

   Should've eventually succeeded creating an AWS CloudFormation stack
  Expected
      <bool>: false
  to be true
  In [SynchronizedBeforeSuite] at: /home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/shared/suite.go:147 @ 06/20/24 17:24:37.437 

/test pull-cluster-api-provider-aws-e2e

r4f4 commented 3 months ago

aws-e2e: the 2 e2e failures don't look related to the PR changes but will retest again to be sure.

Test from this PR

In the log of this PR no revokes until the cluster is deleted.

Test from another PR

Looking at an old test log:

I0611 16:19:08.741022       1 securitygroups.go:179] "Revoked ingress rules from security group" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="self-hosted-yr5v5v/self-hosted-70i0jt-klt6x" namespace="self-hosted-yr5v5v" name="self-hosted-70i0jt-klt6x" reconcileID="df380180-8af8-45d2-a1c0-fd7d1108558e" cluster="self-hosted-yr5v5v/self-hosted-70i0jt" revoked-ingress-rules=[{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]},{"description":"bgp (calico)","protocol":"tcp","fromPort":179,"toPort":179,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"bgp (calico)","protocol":"tcp","fromPort":179,"toPort":179,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]},{"description":"IP-in-IP (calico)","protocol":"4","fromPort":0,"toPort":0,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"IP-in-IP (calico)","protocol":"4","fromPort":0,"toPort":0,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]}] security-group-id="sg-02a4d2faaf7102a28"
I0611 16:19:09.000272       1 securitygroups.go:193] "Authorized ingress rules in security group" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="self-hosted-yr5v5v/self-hosted-70i0jt-klt6x" namespace="self-hosted-yr5v5v" name="self-hosted-70i0jt-klt6x" reconcileID="df380180-8af8-45d2-a1c0-fd7d1108558e" cluster="self-hosted-yr5v5v/self-hosted-70i0jt" authorized-ingress-rules=[{"description":"bgp (calico)","protocol":"tcp","fromPort":179,"toPort":179,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]},{"description":"IP-in-IP (calico)","protocol":"4","fromPort":-1,"toPort":65535,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208","sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]}] security-group-id="sg-02a4d2faaf7102a28"

So the following rules are being revoked:

[{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]}

Then being authorized as:

{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208","sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]}
r4f4 commented 3 months ago

/test pull-cluster-api-provider-aws-e2e

r4f4 commented 3 months ago

/test pull-cluster-api-provider-aws-e2e

r4f4 commented 3 months ago

/assign @nrb

Tests look good. This is ready for review.

richardcase commented 2 months ago

Thanks @r4f4

/approve

k8s-ci-robot commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/main/OWNERS)~~ [richardcase] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment