grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
24.05k stars 3.47k forks source link

Loki mixin: dashboard job selectors don't match loki deployed via helm chart when using provided servicemonitor #13631

Open jmichalek132 opened 4 months ago

jmichalek132 commented 4 months ago

Describe the bug Job label selectors used in dashboards from loki mixin (https://github.com/grafana/loki/blob/main/production/loki-mixin/dashboards/loki-writes.libsonnet#L21) don't match the job label value on loki, when running it in the distributed mode using the helm chart from loki repo in combination with enabling the service monitor for scraping metrics.

The job label selector snippet from one of the dashboards.

distributor: if $._config.meta_monitoring.enabled
then [utils.selector.re('job', '($namespace)/(distributor|%s-write|loki-single-binary)' % $._config.ssd.pod_prefix_matcher)]
else [utils.selector.re('job', '($namespace)/%s' % (if $._config.ssd.enabled then '%s-write' % $._config.ssd.pod_prefix_matcher else 'distributor'))],

ends up producing job=~"(loki)/distributor" label selector. but in our case the actual value of the labels is job="loki/loki-distributor" not matching the label selector. This is due to the service monitor:

    path: /metrics
    port: http-metrics
    relabelings:
    - action: replace
      replacement: loki/$1
      sourceLabels:
      - job
      targetLabel: job

The loki/ part is added using relabeling, and the loki-distributor is the name of the service.

To Reproduce Steps to reproduce the behavior:

  1. Started Loki 3.0.0, helm chart version 6.6.0
  2. Enable service monitor
  3. Deploy dashboards from the mixin

Expected behavior Dashboards present in loki mixin would work out of box with loki in the distributed mode deployed using the helm chart.

Environment:

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.

jmichalek132 commented 3 months ago

So for now I went with a workaround which got at least some of the panels working which is adding a relabaling rule to service monitor in the helm chart, to make the job label match what the dashboards expect.

monitoring:
  serviceMonitor:
    enabled: true
    interval: 60s
    relabelings:
      - action: replace
        replacement: loki/$1
        regex: loki/loki-(.*)
        sourceLabels:
        - job
        targetLabel: job