prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.6k stars 2.15k forks source link

opsgenie_config using api_key_file not working #3764

Open zoezhangmattr opened 6 months ago

zoezhangmattr commented 6 months ago

What did you do? using vault injector to inject the api key - has issue the /vault/secrets/opsgenie_api_key , the content is the apikey file owner is nobody, same as alertmanager user/group,. its mode is 644 or 777, tried both

same alert can be routed to slack, but cant be opsgenie

using plain text api key value - works What did you expect to see? thought it should work, but so far no luck, need some guidance pls What did you see instead? Under which circumstances? ts=2024-03-13T02:42:59.388Z caller=notify.go:848 level=warn component=dispatcher receiver=opsgenie integration=opsgenie[0] aggrGroup="{}/{severity=~\"^(?:critical|error)$\"}:{}" msg="Notify attempt failed, will retry later" attempts=1 err="Post \"https://api.opsgenie.com/v2/alerts\": net/http: invalid header field value for \"Authorization\"" Environment

TheMeier commented 6 months ago

This looks like something is wrong with the api key and not alertmanager. Have you verified, eg in a test pod, that /vault/secrets/opsgenie_api_key really contains the correct key?

zoezhangmattr commented 6 months ago

This looks like something is wrong with the api key and not alertmanager. Have you verified, eg in a test pod, that /vault/secrets/opsgenie_api_key really contains the correct key?

thanks for reply, yes, the file has the correct key id, funny thing is using the same way to do opsgenie heatbeat, using same key, works for deadman switch

- name: prometheus-deadman-switch
  webhook_configs:
  - url: https://api.opsgenie.com/v2/heartbeats/xxxxxx/ping
    send_resolved: false
    http_config:
      basic_auth:
        username: ':'
        password_file: /vault/secrets/opsgenie_api_key
TheMeier commented 6 months ago

One is an opsgenie_configs the other is a http_config. You are using /vault/secrets/opsgenie_api_key as a password in the latter indicating to me that it contains a paassword and not an API key.

TheMeier commented 6 months ago

@zoezhangmattr any feedback?

zoezhangmattr commented 6 months ago

One is an opsgenie_configs the other is a http_config. You are using /vault/secrets/opsgenie_api_key as a password in the latter indicating to me that it contains a paassword and not an API key.

no, as i mentioned before, the api key is working if using k8s secret , same api key, the password is correct, it should be opsgenie api key in this case

grobinson-grafana commented 5 months ago

Hi @zoezhangmattr! Does the file exist and contain the secret at the time the Alertmanager is started? It sounds like there might be a race condition between the Alertmanager starting and vault-injector writing the file.

jdegendt commented 3 months ago

I'm also running into this. If I go ahead and put the api key directly as a string into opsgenie_config/api_key of the receiver, it works.

When using opsgenie_config/api_key_file and a secret that's correctly mounted, it breaks with the exact same API key and Alertmanager logs invalid header field value for \"Authorization\".

grobinson-grafana commented 3 months ago

I'm also running into this. If I go ahead and put the api key directly as a string into opsgenie_config/api_key of the receiver, it works.

When using opsgenie_config/api_key_file and a secret that's correctly mounted, it breaks with the exact same API key and Alertmanager logs invalid header field value for \"Authorization\".

Can you check this?

Does the file exist and contain the secret at the time the Alertmanager is started? It sounds like there might be a race condition between the Alertmanager starting and vault-injector writing the file.

jdegendt commented 3 months ago

@grobinson-grafana, perhaps to add: I'm not using Vault to inject the file at hand. I'm deploying using Helm and there's no init containers involved (aside from config-reloader).

So given that the secret is deployed beforehand and I'm not injecting using Vault, I'm assuming the file is present before Alertmanager starts given standard Kubernetes pod lifecycle mgmt, right?

Let me see if I can figure out how to add a short magic sleep before the Alertmanager process starts, in the meantime Alertmanager values for reference:

...
alertmanager:
  enabled: true

  alertmanagerSpec:
    image:
      registry: quay.io
      repository: prometheus/alertmanager
      tag: v0.27.0
      sha: ""

    secrets:
    - opsgenie-api-key

  config:
    global:
      resolve_timeout: 5m

    route:
      group_by: ['namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'null'

      routes:
      - receiver: 'null'
        matchers:
          - job !~ "fdbmeter.*"
      - receiver: 'opsgenie'
        matchers:
          - job =~ "fdbmeter.*"

    receivers:
    - name: 'null'
    - name: 'opsgenie'
      opsgenie_configs:
        - tags: 'integrities,foundationdb'
          api_key_file: /etc/alertmanager/secrets/opsgenie-api-key/opsgenie
...
jdegendt commented 3 months ago

Went ahead and modified the statefulset as such:

      containers:
      - command: [
        "/bin/sh", "-c"
        ]
        args:
        - cat "/etc/alertmanager/secrets/opsgenie-api-key/opsgenie";
          /bin/alertmanager --config.file=/etc/alertmanager/config_out/alertmanager.env.yaml ...;

And it outputs my API key just fine, which makes me doubt there's a race condition at play here, anything else I can test here?

jdegendt commented 3 months ago

I ended up adding some additional logging to the Opsgenie notifier to print the headers before alerting and lo and behold, there's a newline attached to my API key:

ts=2024-06-24T14:32:51.702Z caller=opsgenie.go:296 level=info integration=opsgenie SETAUTHHEADERTO:="GenieKey redacted-api-key-foo-bar\n"

So I'll have a look at how I'm templating my secret file.

Edit: Also correct me if I'm wrong here but from looking at the code, I doubt this'll ever be a race condition since the API key is read from the file each time a HTTP request to OpsGenie is being built, and seemingly not being persisted in the notifier config struct. See this routine here.

fralvarop commented 2 months ago

Im encountering the same issue with this configuration, im not using secrets in anyway, im setting the api key as plain text.

alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 3h
      receiver: opsgenie
      routes:
        - match: {}
          receiver: opsgenie
    receivers:
      - name: opsgenie
        opsgenie_configs:
          - api_key: <plain-api-key>
            responders:
              - name: <team-name>
                type: team

This is the error message i see in the logs:

ts=2024-07-10T09:23:26.444Z caller=notify.go:745 level=warn component=dispatcher receiver=kube-prometheus-stack/alertmanager-config-management/opsgenie integration=opsgenie[0] aggrGroup="{}/{}:{alertname=\"KubeVersionMismatch\", prometheus=\"kube-prometheus-stack/kube-prometheus-stack-prometheus\", severity=\"warning\"}" msg="Notify attempt failed, will retry later" attempts=1 err="Post \"https://api.opsgenie.com/v2/alerts\": net/http: invalid header field value for \"Authorization\"" 

Any help??

@zoezhangmattr did you manage to resolve this?

armsnyder commented 1 month ago

We had this same issue today, and the cause was that our api key secret ended in a newline character before it was base64 encoded.