Open nasusoba opened 5 months ago
Creating a new machine and deleting the old one seems more disruptive than just restarting the service. Does CAPI have no way to handle renewing certs on existing machines?
Creating a new machine and deleting the old one seems more disruptive than just restarting the service. Does CAPI have no way to handle renewing certs on existing machines?
For now, what CAPI could support is this creating and deleting flow for cert renewal, because CAPI assumed a machine is immutable after creation (see relevant issue). This flow is also the standard flow for machine upgrade/ remediation when CAPI manages the cluster. There is work for in-place upgrade but it is still undergoing.
Before inplace-upgrade is ready, I think .rolloutBefore.certificatesExpiryDays
could provide an option if the user thinks this creating and updating flow is acceptable compared to manual cert rotation.
@brandond I find that k3s is exporting CertificateExpirationWarning
and CACertificateExpirationWarning
event on the k8s node if the cert is close to expire. Could k3scapi relies on this warning for checking how soon the cert is expiring? Thanks!
The events are a good indicator; there are also metrics available but those would require enabling an agent metrics endpoint that is disabled by default.
The events are a good indicator; there are also metrics available but those would require enabling an agent metrics endpoint that is disabled by default.
I think we are not enabling those metrics endpoint. Should we enable the metrics endpoint or is ok to just read the events?
Events are probably fine.
For now, CAPI supports auto cert rotation by setting
.rolloutBefore.certificatesExpiryDays
(capi doc). It will rollout a machine by creating a new replace if the old machine has a certificate near-expiry.For k3s, leaf certificates will expired in 365 days, and the leaf cert will automatically being rotated when k3s restarts and the certificate is within 90 days of expiring (ref). But there would be no gaurantee and it might results in downtime. It would be good if we also introduce
.rolloutBefore.certificatesExpiryDays
to give the auto cert rotation option.