We need to enhance our ServiceLevelObjective (SLO) Custom Resource Definition (CRD) to better support multi-cluster deployments with a centralized monitoring solution. The current implementation has limitations when applied across multiple Kubernetes clusters, particularly concerning namespace deletion and recording rule management.
Current Problem:
When integrating the SLO CRD with Helm charts and applying to multiple Kubernetes clusters, namespace and group deletion in one cluster can affect the entire group of rules in the centralized monitoring solution. There is no way we can modify or control recording namespace name.
There's no mechanism to append new reconciled rules from different Kubernetes clusters to the same group without overwriting existing rules for a give namespace.
Desired Solution:
Implement a mechanism to create a new namespace (new attribute in CRD) within the same group when reconciled different Kubernetes clusters in the centralized monitoring solution. By default we can keep namespace and group name same unless it is defined in CRD.
When deleting an SLO object in any given cluster, it should only delete the namespace specific rules instead of the entire group.
mplement a finalizer that deletes the entire group only when there's just one namespace associated with it.
Technical Considerations:
This solution may introduce potential race conditions as multiple operators will be working on the same group.
The operator should be designed to reconcile on failure, mitigating issues from concurrent operations.
Implement proper locking mechanisms if required or use a distributed locking service to manage concurrent access to shared resources (Optional). Interface to global distributed locking.
Acceptance Criteria:
SLO rules from multiple clusters can create a namespace in the same group in the centralized monitoring solution.
Deleting an SLO object in one cluster only removes the rules specific to the namespace which is defined in CRD.
A finalizer is implemented that deletes the entire group when only one namespace remains associated with it.
The solution is resilient to race conditions and can handle concurrent operations from multiple clusters (Out of scope). (Optional)
Description:
We need to enhance our ServiceLevelObjective (SLO) Custom Resource Definition (CRD) to better support multi-cluster deployments with a centralized monitoring solution. The current implementation has limitations when applied across multiple Kubernetes clusters, particularly concerning namespace deletion and recording rule management.
Current Problem:
Desired Solution:
Technical Considerations:
Acceptance Criteria: