grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.83k stars 3.44k forks source link

Helm chart v6 dashboards do not work properly (issues with `cluster` and distributed mode) #13964

Open thomas-goulet opened 2 months ago

thomas-goulet commented 2 months ago

Describe the bug

Dashboards published using the official helm chart are not functional because of the cluster label expected on all metrics.

To Reproduce

  1. Have Prometheus & Grafana installed in your cluster (we're using the kube-prometheus-stack chart)
  2. Install Loki using the Helm chart (v6.10.0)
    • Set monitoring.dashboards.enabled to true.
    • Configure proper labels for the dashboards to be picked up by Grafana
    • (Optionally) Set clusterLabelOverride to whatever you want. It doesn't work with or without.

Expected behavior

I expect dashboards to work without any modification necessary. They should be able to work in the most cases possible and not be restrictive on the use case.

Instead, all dashboards expect a cluster label to be associated to all metrics used which is not present on most metrics. Some dashboards will not show anything even with the cluster filter removed because of label values which don't follow the specified regex patterns.

For example, this query:

sum by (status) ( label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~"$cluster",job=~"($namespace)/(loki|enterprise-logs)-read", route=~"loki_api_v1_series|api_prom_series|api_prom_query|api_prom_label|api_prom_label_name_values|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_labels|loki_api_v1_label_name_values"}[$__rate_interval]), "status", "${1}xx", "status_code", "([0-9]).."), "status", "${1}", "status_code", "([a-z]+)"))

Does not work with the distributed version of the chart because of job=~"($namespace)/(loki|enterprise-logs)-read".

Screenshots, Promtail config, or terminal output

image image

TheRealNoob commented 1 month ago

I'm not sure I follow your issue with the cluster label since that works for me, but your issue with the job label is valid and has been (partly) fixed in the source libsonnet. See https://github.com/grafana/loki/issues/13631

jseiser commented 1 month ago

I can confirm the same problem exists.

EKS 1.29
[kube-prometheus-stack 62.7.0](https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/Chart.yaml)
Loki 6.12.0

We are having the same issue. The Dashboards are all broken.

Its looking for job names, that just do not exists. Things like loki/loki-write. What exists is below.

{job="loki/distributor"}
{job="loki/index-gateway"}
{job="loki/query-scheduler"}
{job="loki/compactor"}
{job="loki/querier"}
{job="loki/ingester-zone-c-headless"}
{job="loki/ingester-zone-a-headless"}
{job="loki/ingester-zone-b-headless"}

My fear is this will not be fixed, since they decided to deprecate the monitoring in these charts, and then created some meta-monitoring chart, https://github.com/grafana/meta-monitoring-chart

So I assume we need to actually remove the dashboard installation being done by this helm chart, and just sideload the libsonnet generated dashboards.

xakaitetoia commented 1 month ago

In my case i use single binary method and just one service monitor with 1 job called loki/loki so yeah all dashboards are broken as well.