Closed r4f4 closed 2 months ago
Hi @r4f4. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Without this change, I'd see a constant stream of revoke-authorize ingress rules in the debug logs [1]. Now I see the rules authorized once and no revokes (will add logs once I have more signal from openshift CI).
Do you think we are covered in the
ReconcileSecurityGroups
or would be hard to reproduce in new case as revoke is in the same function, and/or would be nice to create a test dedicated to exerciseexpandIngressRules()
?
Good call. I can, at a minimum, add unit tests for the expandIngressRules()
function. Will do that while I wait for CI results.
Updated: add a few fixes and unit tests.
Openshift CI results are all green! https://github.com/openshift/installer/pull/8596. Looking at the logs, the only rule being revoked is the one we explicitely removed from the cluster spec:
time="2024-06-13T21:27:14Z" level=debug msg="I0613 21:27:14.705844 356 securitygroups.go:181] \"Revoked ingress rules from security group\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" namespace=\"openshift-cluster-api-guests\" name=\"ci-op-x5n2dn57-2d061-fxccx\" reconcileID=\"1eebbaa2-4a91-478f-91df-6d0cf652d71c\" cluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" revoked-ingress-rules=[{\"description\":\"Bootstrap SSH Access\",\"protocol\":\"tcp\",\"fromPort\":22,\"toPort\":22,\"cidrBlocks\":[\"0.0.0.0/0\"]}] security-group-id=\"sg-0193bb8269722f9e3\""
time="2024-06-13T21:27:14Z" level=debug msg="I0613 21:27:14.705926 356 securitygroups.go:150] \"second pass security group reconciliation\" controller=\"awscluster\" controllerGroup=\"infrastructure.cluster.x-k8s.io\" controllerKind=\"AWSCluster\" AWSCluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" namespace=\"openshift-cluster-api-guests\" reconcileID=\"1eebbaa2-4a91-478f-91df-6d0cf652d71c\" cluster=\"openshift-cluster-api-guests/ci-op-x5n2dn57-2d061-fxccx\" group-id=\"sg-06d4855d6bc549b25\" name=\"ci-op-x5n2dn57-2d061-fxccx-lb\" role=\"lb\""
@damdo would you mind enabling tests on this PR?
/cc @alexander-demicev I see you've recently changed this code. WDYT of this fix?
@mtulio following your suggestion, I've added a unit test to check that we are not revoking ingress rules unnecessarily.
/lgtm
@mtulio: changing LGTM is restricted to collaborators
/ok-to-test
Tests are all green here as well!
/test ?
@r4f4: The following commands are available to trigger required jobs:
/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-build-docker
/test pull-cluster-api-provider-aws-test
/test pull-cluster-api-provider-aws-verify
The following commands are available to trigger optional jobs:
/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-e2e-clusterclass
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e-eks-gc
/test pull-cluster-api-provider-aws-e2e-eks-testing
Use /test all
to run the following jobs that were automatically triggered:
pull-cluster-api-provider-aws-apidiff-main
pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-build-docker
pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-verify
/test pull-cluster-api-provider-aws-e2e pull-cluster-api-provider-aws-e2e-blocking
aws-e2e
failure doesn't seem to be cause by the PR changes:
Should've eventually succeeded creating an AWS CloudFormation stack
Expected
<bool>: false
to be true
In [SynchronizedBeforeSuite] at: /home/prow/go/src/sigs.k8s.io/cluster-api-provider-aws/test/e2e/shared/suite.go:147 @ 06/20/24 17:24:37.437
/test pull-cluster-api-provider-aws-e2e
aws-e2e
: the 2 e2e failures don't look related to the PR changes but will retest again to be sure.
In the log of this PR no revokes until the cluster is deleted.
Looking at an old test log:
I0611 16:19:08.741022 1 securitygroups.go:179] "Revoked ingress rules from security group" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="self-hosted-yr5v5v/self-hosted-70i0jt-klt6x" namespace="self-hosted-yr5v5v" name="self-hosted-70i0jt-klt6x" reconcileID="df380180-8af8-45d2-a1c0-fd7d1108558e" cluster="self-hosted-yr5v5v/self-hosted-70i0jt" revoked-ingress-rules=[{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]},{"description":"bgp (calico)","protocol":"tcp","fromPort":179,"toPort":179,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"bgp (calico)","protocol":"tcp","fromPort":179,"toPort":179,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]},{"description":"IP-in-IP (calico)","protocol":"4","fromPort":0,"toPort":0,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"IP-in-IP (calico)","protocol":"4","fromPort":0,"toPort":0,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]}] security-group-id="sg-02a4d2faaf7102a28"
I0611 16:19:09.000272 1 securitygroups.go:193] "Authorized ingress rules in security group" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="self-hosted-yr5v5v/self-hosted-70i0jt-klt6x" namespace="self-hosted-yr5v5v" name="self-hosted-70i0jt-klt6x" reconcileID="df380180-8af8-45d2-a1c0-fd7d1108558e" cluster="self-hosted-yr5v5v/self-hosted-70i0jt" authorized-ingress-rules=[{"description":"bgp (calico)","protocol":"tcp","fromPort":179,"toPort":179,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]},{"description":"IP-in-IP (calico)","protocol":"4","fromPort":-1,"toPort":65535,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208","sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]}] security-group-id="sg-02a4d2faaf7102a28"
So the following rules are being revoked:
[{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-02a4d2faaf7102a28"]},{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-0799511b5638468a1"]}
Then being authorized as:
{"description":"Kubernetes API","protocol":"tcp","fromPort":6443,"toPort":6443,"sourceSecurityGroupIds":["sg-04c31b37099612208","sg-02a4d2faaf7102a28","sg-0799511b5638468a1"]}
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e
/assign @nrb
Tests look good. This is ready for review.
Thanks @r4f4
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: richardcase
The full list of commands accepted by this bot can be found here.
The pull request process is described here
What type of PR is this? /kind bug
What this PR does / why we need it:
We are comparing sets that are not compatible, so rules will always be revoked and authorized during reconciliation. We need to expand the rules from the spec so there is one rule for each item in cidrBlock/sourceSecurityGroupIds.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged): Fixes #5023Special notes for your reviewer:
Checklist:
Release note: