Closed philippemnoel closed 1 week ago
Note that I think we probably want Prometheus, but I'm not 100% sure we want Grafana. Perhaps just piping Prometheus metrics directly to us is easier, and we can have a single unified dashboard? Maybe starting with just Prometheus is lower overhead and a good first step.
taking this up.
This is already configured in the paradedb/byoc
repo as well, so we can probably just take it from there
From Mauricio:
[20:49, 14/8/2024] Mauricio Araujo: Hey Phil, they are not included, those need to be installed as separate charts. The cnpg chart only installs the operator and the necessary definitions for that. The operator has a metrics exporter which essentially exposes the metrics to Prometheus, but Prometheus itself has to be installed and configured to scrape the cnpg metrics, it wont do that by default. Also for Grafana you need to configure the cnpg dashboard [20:49, 14/8/2024] Mauricio Araujo: That is all explained here: https://cloudnative-pg.io/documentation/1.23/quickstart/#part-4-monitor-clusters-with-prometheus-and-grafana
https://github.com/cloudnative-pg/grafana-dashboards/blob/main/charts/cluster/grafana-dashboard.json
^ We might need to bring this back, why was it moved to a dedicated repository @itay-grudev?
It makes maintenance of the dashboard easier. We were also planning to expand the dashboards.
That being said the Grafana dashboard is part of the operator chart. I strongly discourage you from maintaining a copy of it. Users should be encouraged to use the official operator chart.
As per monitoring and installing Prometheus and/or Grafana - that is up to the user and disabled by default in both charts. I recommend using the kube-prometheus-stack
helm chart which can be configured separately.
What I've disabled
PodMonitor
, as it requires enabling the Grafana and Prometheus charts:https://cloudnative-pg.io/documentation/1.23/quickstart/#part-4-monitor-clusters-with-prometheus-and-grafana
In order to properly assist customers in a BYOC environment, we should reenable those so we can export logs over (at least Prometheus logs) that we can plot to see what's wrong.
Why ^
How You can see the tutorial for adding Prometheus/Grafana here: https://cloudnative-pg.io/documentation/current/quickstart/