prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
5.12k stars 5.03k forks source link

[kube-prometheus-stack] How to add/update prometheus alerting rules using helm? #4009

Open rodionovid opened 1 year ago

rodionovid commented 1 year ago

Hello. I want to add some extra prometheus alerting rules using helm. I can add this rules manually via prometheus or grafana UI but this method doesn't suit for me. So, the question is: how can I add/update prometheus alerting rules in kubernetes cluster using local yaml files?

I tried to upgrade promtehteus release using helm upgrade command. For this purpose I created a local configuration file prometheus.yaml and copied into it prometheus configuration from prometheus UI control panel (status/config section in the navigation panel). Also I added in rule_files section path to my local alert_config.yaml:

global:
...
alerting:
....
rule_files:
- ./alert_config.yaml
...

File alert_config.yaml is pretty simple and contains single additional rule:

groups:
  - name: Host
    rules:
      - alert: HostOutOfMemory
        expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 50) * on(instance) group_left (nodename) node_uname_info{ nodename=~".+" }
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: Host out of memory (instance { { $labels.instance } })
          description: "Node memory is filling up (< 50% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Then I executed helm upgrade:

helm upgrade -f prometheus.yaml prometheus prometheus-community/kube-prometheus-stack -n prometheus-community

And got successful output:

Release "prometheus" has been upgraded. Happy Helming!
NAME: prometheus
LAST DEPLOYED: Mon Nov 13 14:28:55 2023
NAMESPACE: prometheus-community
STATUS: deployed
REVISION: 2
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace prometheus-community get pods -l "release=prometheus"

But after that new prometheus alert rule didn't appear. Recreating the pods using deprecatd helm flag --recreate-pods or by manual scaling didn't help for me too. I will appreciate any help with this question.

stale[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

harisrg commented 9 months ago

Did you manage to figure it out @rodionovid?

I'm facing the exact same issue here.

rodionovid commented 9 months ago

@harisrg, actually I found a solution. Thank you for reminding about this post. I will use my example from above with single test rule. in order to apply this rule to prometheus you need to create a yaml (in my case values.yaml) file with following content:

additionalPrometheusRulesMap:
  rule-name:
    groups:
      - name: Node
        rules:
          - alert: HostOutOfMemory
            expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 50) * on(instance) group_left (nodename) node_uname_info{ nodename=~".+" }
            for: 2m
            labels:
              severity: warning
            annotations:
              summary: Host out of memory (instance { { $labels.instance } })
              description: "Node memory is filling up (< 50% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

After that you need to upgrade your chart using this file: helm upgrade prometheus -f values.yaml prometheus-community/kube-prometheus-stack -n prometheus-community

As a result, a single group of rules called "Node" that contains a single alert rule called "HostOutOfMemory" will be created. One remark. Each time you update helm chart another params (admin password, smtp configs,..) may be overridden by default values. To avoid that you need to specify this params in values.yaml or another file that you will apply while updating your chart. To find out how to specify this params you need to get full list of overridable params in an application. In order to do that execute the following command:

helm show values prometheus-community/kube-prometheus-stack > default-values.yaml

It will generates default-values.yaml file that contains all overridable params. After that you need to search along this file configs that you want to change. It may take a lot of efforts because of tons of this configs. Also there is no standarts for naming - it's only depends from chart developers and how they called a config.

bd-spl commented 6 months ago

That should work for creating a new rule, how about changing the existing ones?

UPDATE: I see that there is a flag to disable rules on a case by case basis. As helm does not merge rules, there is not much options to update existing rules, but disabling the "stock" one, and readding it as needed, the way it is noted above, yet by using another group name (entries cannot be merged with existing groups, I believe)

zerg-su commented 6 months ago

In the latest versions of the chart, I found the customRules section in the configuration.

I assume that to override a rule, it's now easier to simply rewrite the rule and set the severity to "none" in the default rule.

customRules:
  NodeHighNumberConntrackEntriesUsed:
    severity: "none"

additionalPrometheusRulesMap:
  # rewrite basic rules
  rewrite-rules:
    groups:
      - name: node-exporter
        rules:
        - alert: NodeHighNumberConntrackEntriesUsed
          annotations:
            description: "{{ $value | humanizePercentage }} of conntrack entries are used."
            runbook_url: "https://runbooks.prometheus-operator.dev/runbooks/node/nodehighnumberconntrackentriesused"
            summary: "Number of conntrack are getting close to the limit."
          expr: (node_nf_conntrack_entries{job="node-exporter"} / node_nf_conntrack_entries_limit) > 0.80
          labels:
            severity: warning