k3s-io / cluster-api-k3s

Cluster API k3s
Apache License 2.0
145 stars 40 forks source link

Auto cert rotation support #119

Open nasusoba opened 5 months ago

nasusoba commented 5 months ago

For now, CAPI supports auto cert rotation by setting .rolloutBefore.certificatesExpiryDays(capi doc). It will rollout a machine by creating a new replace if the old machine has a certificate near-expiry.

For k3s, leaf certificates will expired in 365 days, and the leaf cert will automatically being rotated when k3s restarts and the certificate is within 90 days of expiring (ref). But there would be no gaurantee and it might results in downtime. It would be good if we also introduce .rolloutBefore.certificatesExpiryDays to give the auto cert rotation option.

brandond commented 5 months ago

Creating a new machine and deleting the old one seems more disruptive than just restarting the service. Does CAPI have no way to handle renewing certs on existing machines?

nasusoba commented 5 months ago

Creating a new machine and deleting the old one seems more disruptive than just restarting the service. Does CAPI have no way to handle renewing certs on existing machines?

For now, what CAPI could support is this creating and deleting flow for cert renewal, because CAPI assumed a machine is immutable after creation (see relevant issue). This flow is also the standard flow for machine upgrade/ remediation when CAPI manages the cluster. There is work for in-place upgrade but it is still undergoing.

Before inplace-upgrade is ready, I think .rolloutBefore.certificatesExpiryDays could provide an option if the user thinks this creating and updating flow is acceptable compared to manual cert rotation.

nasusoba commented 5 months ago

@brandond I find that k3s is exporting CertificateExpirationWarning and CACertificateExpirationWarning event on the k8s node if the cert is close to expire. Could k3scapi relies on this warning for checking how soon the cert is expiring? Thanks!

brandond commented 5 months ago

The events are a good indicator; there are also metrics available but those would require enabling an agent metrics endpoint that is disabled by default.

nasusoba commented 5 months ago

The events are a good indicator; there are also metrics available but those would require enabling an agent metrics endpoint that is disabled by default.

I think we are not enabling those metrics endpoint. Should we enable the metrics endpoint or is ok to just read the events?

brandond commented 5 months ago

Events are probably fine.