Zero-Downtime: Implement the Runtime-Watcher TLS configuration renewal logic.
The logic is based on the POC results, but it is simplified: The additional secret object is removed, instead the Istio Gateway secret plays the key role in the migration process.
This is based on the following observation:
We must adjust SKR watcher client TLS configuration if and ONLY IF the Istio Gateway TLS configuration changes
This has an impact for the design of the zero-downtime certificate rotation solution.
The system is designed with two independent components, running asynchronously to each other:
The first one observes rotation of the "Root" certificates in KCP and manages the Istio Gateway secret accordingly. It's not related to any particular Kyma or SKR
The second one manages SKR watcher client TLS configuration. It generates/updates the relevant secrets in KCP and SKR. It is coupled to the reconciliation of the Kyma CR.
Note: This issue describes the second component
Responsibility 1: Bootstrap
No Watcher TLS secret exists in the KCP
Wait until the Istio Gateway secret is available in the KCP
Create Watcher TLS certificate in the KCP (using certificate CR - the Cert Manager creates the secret)
Responsibility 2: Migration
When both of the following happen:
Root certificate is more recent than the Watcher TLS secret in the KCP
Istio-Gateway secret is more recent than the Watcher TLS secret in the KCP
Re-generate the Watcher TLS certificate in the KCP (already implemented but triggerred differently)
Responsibility 3: Synchronization
When any of the conditions occur:
Watcher TLS configuration is missing in SKR
Watcher TLS secret in KCP is more recent than the corresponding secret in the SKR
Istio-Gateway secret is more recent than secret in the SKR
Then generate Watcher TLS configuration secret in the SKR, taking the tls.crt and tls.key from the corresponding secret in the KCP, but ca.crt data from the Istio-Gateway secret
Note: Instead of 1. we can also just sync the data with every reconciliation (patch)
Implementation Notes:
This logic is tightly coupled with any given SKR, so it looks it can be implemented as the part of current Kyma reconciliation loop.
The code for Watcher certificates and secrets generation/renewal and for synchronization of these to the SKR is already implemented. It must be adjusted to account for new requirements.
Reasons
We need a robust, zero-downtime solution for the Watcher TLS certificate rotation
Acceptance Criteria
[ ] Create a follow-up issue to provide a metric in the SKR that reports the client certificate used for watcher. Then in Plutono we can see how the migration process works.
[x] Implement the solution along with necessary unit and integration tests
[ ] Update the documentation
[x] Manually test the rotation logic
[x] Update the existing e2e test and add a new one if necessary.
Feature Testing
Testing approach
unit tests, integration tests, e2e test(s)
Existing tests:
Description
Zero-Downtime: Implement the Runtime-Watcher TLS configuration renewal logic.
The logic is based on the POC results, but it is simplified: The additional secret object is removed, instead the Istio Gateway secret plays the key role in the migration process. This is based on the following observation: We must adjust SKR watcher client TLS configuration if and ONLY IF the Istio Gateway TLS configuration changes This has an impact for the design of the zero-downtime certificate rotation solution. The system is designed with two independent components, running asynchronously to each other:
Note: This issue describes the second component
Responsibility 1: Bootstrap
Responsibility 2: Migration
Responsibility 3: Synchronization
tls.crt
andtls.key
from the corresponding secret in the KCP, butca.crt
data from the Istio-Gateway secretNote: Instead of 1. we can also just sync the data with every reconciliation (patch)
Implementation Notes:
Reasons
We need a robust, zero-downtime solution for the Watcher TLS certificate rotation
Acceptance Criteria
Feature Testing
Testing approach
unit tests, integration tests, e2e test(s) Existing tests:
Attachments
Related Issues
https://github.com/kyma-project/lifecycle-manager/issues/1430