Closed tjungblu closed 2 months ago
@tjungblu: The following tests failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
ci/prow/e2e-aws-ovn-etcd-scaling | a8c8457f457d45b7703f8345e1e34c5831aaf496 | link | true | /test e2e-aws-ovn-etcd-scaling |
ci/prow/e2e-operator-fips | a8c8457f457d45b7703f8345e1e34c5831aaf496 | link | true | /test e2e-operator-fips |
ci/prow/e2e-gcp-qe-no-capabilities | a8c8457f457d45b7703f8345e1e34c5831aaf496 | link | false | /test e2e-gcp-qe-no-capabilities |
Full PR test history. Your PR dashboard.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: dusk125, tjungblu
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/hold
Haven't forgotten about this but still reviewing.
Not blocking this PR but just wanted to think ahead on how we actually want to run this cmd automatically once we detect etcd is down with expired certs. That may affect how we generate them here.
First is the detection of expired certs. I thinking this would be a health check or polling probe that can either query etcd locally to see a x509: certificate has expired or is not yet valid
or just inspect the on-disk cert to check the date of expiry.
If this is a sidecar in the operator then we may not have sufficient hostpath permissions to do either of that right? And it can't be in the etcd pod as we need to run this from a single place.
And secondly the distribution step. Since we're generating everything in one place, I'm guessing we have to scp this around to all the other nodes. Not for SNO though.
Lastly since we're only modifying the on-disk cert files, that doesn't change the secrets and bundle configmaps in etcd, that are used by the installer for a new revision. So we need to figure out how we update the cert secrets and configmaps in etcd otherwise the next revision rollout would reuse the expired signer certs in etcd, as opposed to the new ones generated on disk.
Maybe if we relaxed the constraint and assume that the signers aren't expired when the cluster is offline then we can only regenerate the peer/server and client certs on disk, distribute them, bring the cluster up, and then rotate the node cert secrets and configmaps.
Anyway, not a blocker for this PR but we can discuss and flesh that out a bit as well.
@tjungblu: This pull request references ETCD-573 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
PR needs rebase.
/remove-lifecycle stale
you can run it with: