grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
22.71k stars 3.3k forks source link

Helm podAntiAffinity matchLabels should include instance and name #13349

Open Skaronator opened 3 days ago

Skaronator commented 3 days ago

Describe the bug I'm deploy the loki helm chart in my existing monitoring namespace. The monitoring namespace already contains loki-distributed and mimir-distributed helm chart. So there are already a few (50) pods running.

After deploying the loki helm-chart, I noticed that the gateway pods won't schedule.

0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.

After quick debug I noticed that the affinity rule only matches the component label:

https://github.com/grafana/loki/blob/91a34868db61f2cf4299d618c2e48885ff0a705e/production/helm/loki/values.yaml#L946-L952

The issue is that this is not enough. The mimir-distributed chart has a component with gateway label as well, resulting in "no free nodes".

To Reproduce Steps to reproduce the behavior:

  1. Use a single node or 3 node cluster with 3 replicas for the mimir gateway.
  2. Deploy mimir-distributed in a namespace
  3. Deploy loki helm chart in the same namespace
  4. See that the loki gateway pods cannot be scheduled.

Expected behavior

The default matchLabels in podAntiAffinity should be more restricted. e.g. similar to the serviceMatchLabels.

The gateway service has more strict labels as you can see:

  selector:
    app.kubernetes.io/component: gateway
    app.kubernetes.io/instance: loki
    app.kubernetes.io/name: loki

Environment:

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.