As critical backend service of Kyma, the monitoring of the availability of the Infrastructure Manger is critical to react in-time on service degradations.
Goals is to setup a end-2-end test case for the Infrastructure Manager which verifies the correct functionality of this service on KCP. The test should be executed in intervals (e.g. hourly) and create a full-fledged Gardener cluster and also destroy it afterwards.
In case that the cluster creation wasn't possible, an alert should be fired (e.g. via the SRE monitoring system) and inform the Framefrog team about the service degradation.
AC:
[ ] Get in touch with SREs and verify how a full-fledged test case could be integrated into the existing monitoring solution in Kyma
[ ] Implement an test case which requests the KIM to create a Gardener cluster and finally also deletes it:
[ ] The test has to verify that the cluster got successfully created in Gardener
[ ] Check whether the cluster is accessible using the received kubeconfig from Gardener
[ ] Finally destroy the created Kyma cluster
[ ] Ensure a cleanup mechanism is in place which would remove orphan clusters in cases that the test mechanism wasn't able to handle the cleanup as part of the test run.
[ ] Integrate the test case into the monitoring system (based on the guidance from SREs, see step 1) and ensure alerts are fire in case of KIM service degradation
Reasons
Ensure high quality and proactive service monitoring.
Description
As critical backend service of Kyma, the monitoring of the availability of the Infrastructure Manger is critical to react in-time on service degradations.
Goals is to setup a end-2-end test case for the Infrastructure Manager which verifies the correct functionality of this service on KCP. The test should be executed in intervals (e.g. hourly) and create a full-fledged Gardener cluster and also destroy it afterwards.
In case that the cluster creation wasn't possible, an alert should be fired (e.g. via the SRE monitoring system) and inform the Framefrog team about the service degradation.
AC:
Reasons
Ensure high quality and proactive service monitoring.
Attachments