openshift / cluster-etcd-operator

Operator to manage the lifecycle of the etcd members of an OpenShift cluster
Apache License 2.0
96 stars 130 forks source link

NO-JIRA: only read signer/bundles on forced leaf generation #1288

Closed tjungblu closed 4 months ago

tjungblu commented 4 months ago

During vertical scaling we could hit a case where the signers are expired and would trigger a rotation, even though we only wanted to create a new set of peer certs. This can cause bricked clusters, as there is no safe guard for the bundle distribution in this case.

/hold waiting for the test suite to merge first, even though it might not be relevant

openshift-ci-robot commented 4 months ago

@tjungblu: This pull request explicitly references no jira issue.

In response to [this](https://github.com/openshift/cluster-etcd-operator/pull/1288): >During vertical scaling we could hit a case where the signers are expired and would trigger a rotation, even though we only wanted to create a new set of peer certs. This can cause bricked clusters, as there is no safe guard for the bundle distribution in this case. > >/hold >waiting for the test suite to merge first, even though it might not be relevant Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fcluster-etcd-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
hasbro17 commented 4 months ago

/lgtm

Good catch, totally missed that the node-cert diff would permit signer rotation as well. Although it doesn't make sense to add new nodes when your signer is expired, better to be safe.

The e2e won't cover this specific case of signers expired during vertical scaling but would be good to run it to see if we haven't broken the usual rotation flow.

openshift-ci[bot] commented 4 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hasbro17, tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/cluster-etcd-operator/blob/master/OWNERS)~~ [hasbro17,tjungblu] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
tjungblu commented 4 months ago

/retest

tjungblu commented 4 months ago

/hold cancel

openshift-ci-robot commented 4 months ago

/retest-required

Remaining retests: 0 against base HEAD aabb6d6fd25ca24b4b2e77fb5585ce023aca90fc and 2 for PR HEAD b4ca1ff0c7910631e85cbff9943a781ca790535f in total

tjungblu commented 4 months ago

/retest-required

tjungblu commented 4 months ago

/override ci/prow/e2e-aws-ovn-etcd-scaling

does not seem related

openshift-ci[bot] commented 4 months ago

@tjungblu: Overrode contexts on behalf of tjungblu: ci/prow/e2e-aws-ovn-etcd-scaling

In response to [this](https://github.com/openshift/cluster-etcd-operator/pull/1288#issuecomment-2208604290): >/override ci/prow/e2e-aws-ovn-etcd-scaling > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 4 months ago

@tjungblu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-qe-no-capabilities b4ca1ff0c7910631e85cbff9943a781ca790535f link false /test e2e-gcp-qe-no-capabilities
ci/prow/e2e-aws-etcd-certrotation b4ca1ff0c7910631e85cbff9943a781ca790535f link false /test e2e-aws-etcd-certrotation
ci/prow/e2e-operator-fips b4ca1ff0c7910631e85cbff9943a781ca790535f link false /test e2e-operator-fips
ci/prow/e2e-metal-ovn-sno-cert-rotation-shutdown b4ca1ff0c7910631e85cbff9943a781ca790535f link false /test e2e-metal-ovn-sno-cert-rotation-shutdown
ci/prow/e2e-aws-etcd-recovery b4ca1ff0c7910631e85cbff9943a781ca790535f link false /test e2e-aws-etcd-recovery
ci/prow/e2e-metal-ovn-ha-cert-rotation-shutdown b4ca1ff0c7910631e85cbff9943a781ca790535f link false /test e2e-metal-ovn-ha-cert-rotation-shutdown

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-bot commented 4 months ago

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-etcd-operator-container-v4.17.0-202407041049.p0.gd90d8b0.assembly.stream.el9 for distgit cluster-etcd-operator. All builds following this will include this PR.