kyma-project / lifecycle-manager

Controller that manages the lifecycle of Kyma Modules in your cluster.
http://kyma-project.io
Apache License 2.0
10 stars 30 forks source link

No-Downtime CA Certificate Rotation #1428

Closed jeremyharisch closed 7 months ago

jeremyharisch commented 7 months ago

Status

ACCEPTED

Context

With https://github.com/kyma-project/lifecycle-manager/issues/1061 a temporary solution was implemented, resolving the missing feature of Cert-Manager, to rotate all Leaf-Certificates signed by the rotated CA-Certificate. Unfortunately, this implementation has a certain "downtime" which should be minimised or eliminated fully. "Downtime" in this case means, that when the KCP-Gateway already has the new CA-Certificate setup to use for mTLS, not all SKR Clusters directly have the new signed Certificates deployed. Thus, sending runtime-watcher requests to the KCP will be rejected until the next reconciliation loop, when the newly signed Certificate is deployed on the Cluster.

In https://github.com/kyma-project/lifecycle-manager/issues/1073 investigation have been made and multiple solutions have been proposed. For more informations about the solutions please have a look here: https://github.com/kyma-project/lifecycle-manager/issues/1073#issuecomment-2015413050

Decision

It was decided to go with the Intermediate CA.Certificates Bundles. This approach involves a two-phase migration of clients, allowing for a slow, gradual transition with zero downtime. In the table below, you can see the detailed steps of the procedure when the CA-Certificate gets rotated. 'rootA+rootB' signifies that the CA-certificates have been concatenated and set as the 'ca.crt' value in the certificate secret. When transitioning from 'rootA+rootB' to 'rootB', it entails truncating the CA-Certificate String and removing the first certificate from the concatenation.

Pros:

Cons:

Detailed description of how the procedure looks like

Step Step-Name Gateway Server Cert Gateway Accepts Clients (CACert on KCP) Clients Accepts Server (CACert on SKR) Client Cert Note
01 Initial setup rootA rootA rootA rootA ""
02 Generate rootB cert in KCP rootA rootA rootA rootA ""
03 Reconfigure the Gateway in the KCP rootA rootA+rootB rootA rootA All clients with the old Certificates signed by rootA still work
04 Migrate Clients to Certificates signed by rootB rootA rootA+rootB rootA+rootB rootB ""
05 After alle Clients are migrated, switch Gate to accept only certs signed by rootB rootB rootB rootB rootB ""

Consequences

jeremyharisch commented 7 months ago

Implementation Issue: https://github.com/kyma-project/lifecycle-manager/issues/1430