kyma-incubator / reconciler

Kyma reconciler
Apache License 2.0
10 stars 68 forks source link

Improve Monitoring solution for Load-Test #837

Closed jeremyharisch closed 2 years ago

jeremyharisch commented 2 years ago

Description In https://github.com/kyma-incubator/reconciler/issues/823 metrics will be provided for the load test of the mothership reconciler. Unfortunately. to analyse this data the test cluster needs to be up and running; in addition you need to port-forward to the Grafana instance to actually have a look at the gathered metrics.

I talked to different teams regarding a solution, here is what we came up with: We though about using an already running and managed Prometheus/Grafana solution. The one which could have been used is https://monitoring.build.kyma-project.io/dashboards , but this cluster runs inside of the Prow project and is only configured to gather metrics from its closest surrounding - meaning inside of the Prow project. They already thought about implementing a gateway to also gather metrics from outside the project, but due to its overhead it was never decided for. Same goes for other running Prometheus/Grafana solutions (dev,stage,prod), plus we should separate such topics from each other (testing and rollout landscapes).

In our Jellyfish GCP project we have a in-built solution, called stackdriver, in the monitoring Stack of GCP which we can use. For this we need to configure it in a way that metrics can be pushed to an exposed endpoint. We need to stick to the push method, since the metrics to gather are coming from a weekly cluster which may be re-created and some point with a different public IP address.

To gather metrics using the push-approach we could implement it in two different ways:

  1. Having an in-cluster Prometheus instance running, collecting the metrics from the /metrics endpoint. Then this Prometheus instance pushed the gatehred metrics in a specific time interval to the exposed enpoint of the Stackdriver instance. The push mechanism of Prometheus is still in alpha stage. Thus, it could be a bit unstable and we cannot rely on it.
  2. We implement the Pusher from the Prometheus pkg into our metrics pkg. This can be used to send the gathered metrics directly from the mothership to the stackdriver instance. Thus, we do not need an in-cluster Prometheus anymore. For thus a new flag for the mothership binary is needed. This flag will then activate the Pusher and pushes the a given endpoint. Here is an exmaple how a call could look like:
    ./bin/mothership-darwin mothership start --server-port 8080 --send-metrics <stackdriver endpoint IP>

Reasons Having a monitoring solution for our lead-test scenario which is easy reachable at all time.

ACs (for the second approach)

tobiscr commented 2 years ago

Obsolete by using now Prometheus in KCP clusters