Open thomas-goulet opened 2 months ago
I'm not sure I follow your issue with the cluster
label since that works for me, but your issue with the job
label is valid and has been (partly) fixed in the source libsonnet. See https://github.com/grafana/loki/issues/13631
I can confirm the same problem exists.
EKS 1.29
[kube-prometheus-stack 62.7.0](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/Chart.yaml)
Loki 6.12.0
We are having the same issue. The Dashboards are all broken.
Its looking for job names, that just do not exists. Things like loki/loki-write
. What exists is below.
{job="loki/distributor"}
{job="loki/index-gateway"}
{job="loki/query-scheduler"}
{job="loki/compactor"}
{job="loki/querier"}
{job="loki/ingester-zone-c-headless"}
{job="loki/ingester-zone-a-headless"}
{job="loki/ingester-zone-b-headless"}
My fear is this will not be fixed, since they decided to deprecate the monitoring in these charts, and then created some meta-monitoring chart, https://github.com/grafana/meta-monitoring-chart
So I assume we need to actually remove the dashboard installation being done by this helm chart, and just sideload the libsonnet generated dashboards.
In my case i use single binary method and just one service monitor with 1 job called loki/loki
so yeah all dashboards are broken as well.
Describe the bug
Dashboards published using the official helm chart are not functional because of the
cluster
label expected on all metrics.To Reproduce
kube-prometheus-stack
chart)monitoring.dashboards.enabled
totrue
.clusterLabelOverride
to whatever you want. It doesn't work with or without.Expected behavior
I expect dashboards to work without any modification necessary. They should be able to work in the most cases possible and not be restrictive on the use case.
Instead, all dashboards expect a cluster label to be associated to all metrics used which is not present on most metrics. Some dashboards will not show anything even with the cluster filter removed because of label values which don't follow the specified regex patterns.
For example, this query:
sum by (status) ( label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~"$cluster",job=~"($namespace)/(loki|enterprise-logs)-read", route=~"loki_api_v1_series|api_prom_series|api_prom_query|api_prom_label|api_prom_label_name_values|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_labels|loki_api_v1_label_name_values"}[$__rate_interval]), "status", "${1}xx", "status_code", "([0-9]).."), "status", "${1}", "status_code", "([a-z]+)"))
Does not work with the distributed version of the chart because of
job=~"($namespace)/(loki|enterprise-logs)-read"
.Screenshots, Promtail config, or terminal output