saltstack-formulas / prometheus-formula

Manage a Prometheus installation
Other
27 stars 51 forks source link

[BUG] Changes to alerts require reload of prometheus #64

Open B1ue-W01f opened 3 years ago

B1ue-W01f commented 3 years ago

Your setup

Formula commit hash / release tag

n/a

Versions reports (master & minion)

n/a

Pillar / config used

prometheus:
  extra_files:
    apache_rules:
      file: service_rules/apache
      component: alertmanager
      config:
        groups:
          - name: 'apache.rules'
            rules:
              - alert: ApacheDown
                expr: apache_up == 0
                for: 0m
                labels:
                  severity: critical
                annotations:
                  summary: {% raw %} Apache down (instance {{ $labels.instance }}) {% endraw %}
                  description: {% raw %} "Apache down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}" {% endraw %}
              - alert: ApacheWorkersLoad
                expr: (sum by (instance) (apache_workers{state="busy"}) / sum by (instance) (apache_scoreboard) ) * 100 > 80
                for: 2m
                labels:
                  severity: warning
                annotations:
                  summary: {% raw %} Apache workers load (instance {{ $labels.instance }}) {% endraw %}
                  description: {% raw %} "Apache workers in busy state approach the max workers count 80% workers busy on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}" {% endraw %}
              - alert: ApacheRestart
                expr: apache_uptime_seconds_total / 60 < 1
                for: 0m
                labels:
                  severity: warning
                annotations:
                  summary: {% raw %} Apache restart (instance {{ $labels.instance }}) {% endraw %}
                  description: {% raw %} "Apache has just been restarted.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}" {% endraw %}

Bug details

Describe the bug

Changing the extra_files alerts appears to result in a restart of the alertmanager service but needs to restart the prometheus process too otherwise changes arent updated in prometheus.

Steps to reproduce the bug

  1. Highstate pillar with alerts.
  2. Remove an alert from the pillar
  3. Re highstate pillar
  4. Note alert has not been removed from prometheus
  5. Restart prometheus
  6. Note now alert has been removed

Expected behaviour

Prometheus service should be restarted on change to extra_files / alerts

Attempts to fix the bug

None yet.

mdschmitt commented 2 years ago

I think the problem here is just misconfiguration.

Your pillar has component: alertmanager present for apache_rules. Thing is, rules aren't dealt with by Alertmanager, they're dealt with by Prometheus itself and Alertmanager is just used to fire off alerts. Remove the component part of this (so as to use the default prometheus value), or set it to component: prometheus. That will make Prometheus reload instead of Alertmanager and you should be set to jet.

mdschmitt commented 2 years ago

It looks like pillar.example is misleading in this regard.