prometheus-operator / prometheus-operator

Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes
https://prometheus-operator.dev
Apache License 2.0
9.02k stars 3.7k forks source link

AlertmanagerConfig: Matching on `namespace` OR `exported_namespace`? #6870

Open m3adow opened 1 month ago

m3adow commented 1 month ago

What happened?

Description

I'm currently working on bringing our Google Managed Prometheus (GMP) alerts into our alertmanager instance which is deployed with Prometheus Operator. GMP uses the exported_namespace label for the identification of the correct namespace to alert for. At the same time, we have other metrics which still use the namespace label for that.

Therefore, I need to find a way to match either of those labels in each AlertmanagerConfig our teams deploy in their namespaces. Preferrably with only one AlertmanagerConfig CRD.
If I understood correctly, it's not possible to OR the spec.route.matchers. Additionally, a namespace matcher is automatically added to each AlertmanagerConfig as long as the Alertmanager instance is configured with alertmanagerConfigMatcherStrategy.type: OnNamespace which is the default as well. All my efforts until now either didn't work or had some very annoying side effects.

My best approach right now is to:

The problem is, that the .spec.route.receiver is triggered for every alert matching, which is a lot of course. Although the CRD description makes it sound like .spec.route.receiver could be omitted, it's not possible. Therefore, I have to configure a "null" receiver for each AlertmanagerConfig CRD which is still shown in the Alertmanager UI cluttering the Overview with useless information

Is there any good way to achive what I want?

Prometheus Operator Version

v0.76.0

Kubernetes Version

clientVersion:
  buildDate: "2024-07-16T23:54:40Z"
  compiler: gc
  gitCommit: 6fc0a69044f1ac4c13841ec4391224a2df241460
  gitTreeState: clean
  gitVersion: v1.30.3
  goVersion: go1.22.5
  major: "1"
  minor: "30"
  platform: linux/amd64
kustomizeVersion: v5.0.4-0.20230601165947-6ce0bf390ce3
serverVersion:
  buildDate: "2024-07-19T09:16:41Z"
  compiler: gc
  gitCommit: 0858698e672612745ea0150984b899843b9f0bba
  gitTreeState: clean
  gitVersion: v1.30.3-gke.1225000
  goVersion: go1.22.5 X:boringcrypto
  major: "1"
  minor: "30"
  platform: linux/amd64

Kubernetes Cluster Type

GKE

How did you deploy Prometheus-Operator?

helm chart:prometheus-community/kube-prometheus-stack

Manifests

# My current approach
---
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: my-test
  namespace: my-test
spec:
  receivers:
  - name: my-test
  - name: "null"
  route:
    receiver: "null"
    matchers:
      - name: namespace
        matchType: "!="
        value: ""
    routes:
      - matchers:
        - name: exported_namespace
          matchType: "="
          value: my-test
        receiver: my-test
      - matchers:
        - name: namespace
          matchType: "="
          value: my-test
        receiver: my-test
dosubot[bot] commented 1 month ago

@m3adow Hello there! I'm here to help you with any bugs, questions, or contributions. Let's solve this together!

I found a similar unsolved discussion and several closed issues that might help you with your problem.

  1. Similar Discussion:

  2. Closed Issues:

Suggested Approach

To achieve your goal of matching either namespace or exported_namespace labels, you can follow these steps:

  1. Set alertmanagerConfigMatcherStrategy.type to None:

    apiVersion: monitoring.coreos.com/v1
    kind: Alertmanager
    metadata:
     name: alertmanager
    spec:
     alertmanagerConfigMatcherStrategy:
       type: None
  2. Configure AlertmanagerConfig CRD:

    apiVersion: monitoring.coreos.com/v1alpha1
    kind: AlertmanagerConfig
    metadata:
     name: my-test
     namespace: my-test
    spec:
     receivers:
     - name: my-test
     - name: "null"
     route:
       receiver: "null"
       matchers:
         - name: namespace
           matchType: "!="
           value: ""
       routes:
         - matchers:
           - name: exported_namespace
             matchType: "="
             value: my-test
           receiver: my-test
         - matchers:
           - name: namespace
             matchType: "="
             value: my-test
           receiver: my-test

This configuration should help you match alerts based on either namespace or exported_namespace without cluttering the Alertmanager UI with unnecessary receivers.

To continue talking to Dosu, mention @dosu.

m3adow commented 1 month ago

My bad, I made a copy & paste error in my manifest. Corrected the spec.route.receiver to use my-test. Apart from that @dosu s manifest is identical to mine.
But that way generates one entry in the alertmanager UI per alert and alertmanagerconfig. So seven alertmanagerconfigs and five alerts generate 35 entries. That's a lot of clutter.

dosubot[bot] commented 1 month ago

Could you provide more details on the specific side effects or issues you encountered when trying other approaches? This might help in suggesting a more refined solution.

To continue talking to Dosu, mention @dosu.

simonpasquier commented 1 month ago

GMP uses the exported_namespace label for the identification of the correct namespace to alert for. At the same time, we have other metrics which still use the namespace label for that.

Can't you fix GMP to use the namespace label insted?

m3adow commented 1 month ago

Can't you fix GMP to use the namespace label insted?

No, that's fixed. As I still couldn't find a sensible solution, I guess I will be creating two AlertmanagerConfig CRDs per namespace, one for namespace and one for exported_namespace. This in addition to a alertmanagerConfigMatcherStrategy.type: None setting and a thorough use of Kustomize and YAML anchors for the most possible DRYness is probably the least annoying solution.

simonpasquier commented 1 month ago

So it means that for example, metrics from kube-state-metrics always have namespace="<kube-state-metrics namespace>" and exported_namespace="<namespace of the resource>"?

m3adow commented 1 month ago

Correct. And that's the exact problem. While kube-state-metrics runs in one fixed namespace my platform team controls, an alert for a long running Cronjob should go to the application team responsible for the namespace.