kyma-project / lifecycle-manager

Controller that manages the lifecycle of Kyma Modules in your cluster.
http://kyma-project.io
Apache License 2.0
10 stars 30 forks source link

Provide condition and metrics for Manifests stuck in Deleting state due to DefaultCR Warning #1830

Open c-pius opened 2 months ago

c-pius commented 2 months ago

Description

Since the separation of Manifest state from Module state we had multiple occasions where SRE were alerted about Manifests being stuck in deletion. The reason for being stuck in Deleting state is that the related ModuleCR is in Warning state indicating that the end user is required to perform cleanups before progressing with the deletion.

While we don't want to introduce a dedicate state for this, we want to give the possibility to filter for this situation in alerting and when looking at the Manifest.

Reasons

Have an indicator that Deletion is blocked by required user interaction so that SRE can filter for this in their alerting.

Acceptance Criteria

Feature Testing

When ManifestCR.Status.State is Deleting And ModuleCR.Status.State is Warning Then ManifestCR.Status.Conditions includes

     - lastTransitionTime: <time>
       message: "Module CR is in Warning state" 
       observedGeneration: <gen>
       reason: "Warning"
       status: "True"
       type: "ModuleCRWarning"

Then metric lifecycle_mgr_module_condition{module_name="<>", kyma_name="<>", condition="moduleCRWarning"} is written with value 1 When ManifestCR is deleted Then lifecycle_mgr_module_condition{module_name="<>", kyma_name="<>", condition="moduleCRWarning"} removed (either set to 0 or completely deleted. Check what we do with lifecycle_mgr_module_state and do the same)

Attachments

c-pius commented 2 months ago

Can it happen that ModuleCR transitions to another state than warning?

c-pius commented 2 weeks ago

Once this is implemented, please let SRE know in issue 5941 that they can adapt their alerts.