prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.57k stars 2.14k forks source link

Regression in 0.26.0 when using `amtool config rules` with `webhook_configs` #3505

Open czenker opened 1 year ago

czenker commented 1 year ago

What did you do?

Update from version 0.25.0 to 0.26.0.

What did you expect to see?

We use amtool config routes test alertname=Foobar severity=panic in an automated test to ensure there is no misconfiguration that would not route alerts into on-call. We expected the command to work as before in the sense, that it would tell us which receivers would receive that alert.

What did you see instead? Under which circumstances?

With amtool 0.26.0:

error: scheme required for webhook url

With amtool 0.25.0 (error message is different, but the problem is the same):

Warning: amtool version (0.25.0) and alertmanager version (0.26.0) are different.
amtool: error: unsupported scheme "" for URL

amtool exits with status code 1 regardless and does not print the matched routes.

This is very likely a regression from https://github.com/prometheus/alertmanager/pull/3228 as the API does not output the url anymore, but amtool seems to be unaware of it and detects an error.

It also affects the amtool config routes show command.

Alertmanager configuration file

This probably only happens when using a webhook_configs with a url:

global: { } # ...
route: { } # ...
receivers:
- name: oncall-alerts
  webhook_configs:
  - url: http://example.com/webhook
templates: [] # ...

As a workaround the new url_file could be used as this does not seem to validate any scheme.

grobinson-grafana commented 1 year ago

Looks like this is an issue when using amtool with --alertmanager.url. It does not affect amtool when using --config.file.

grobinson-grafana commented 1 year ago

I think the issue is as @czenker explained.

For example, the command amtool config routes show fails on Line 65 of cli/routing.go. This command seems to use the /api/v2/status API which returns the Alertmanager configuration but with the URL field for each webhook_configs as <secret>. However, if I revert the URL field for WebhookConfig from *SecretURL back to *URL then it works as the URL is now returned in the /api/v2/status API:

./amtool config routes show --alertmanager.url=http://127.0.0.1:9093
Routing tree:
.
└── default-route  receiver: example

That said, I don't think we should revert the change from *URL to *SecretURL. Instead, I think the actual bug is in UnmarshalYAML for WebhookConfig.

The reason for this is that the UnmarshalYAML function for *SecretURL has special code to detect <secret> and return nil, but UnmarshalYAML for WebhookConfig ignores this when instead it should be a special case.

However, looking further into how UnmarshalYAML works for SecretURL, I don't think this check is even needed as there is an existing check in parseURL for the scheme. Perhaps we can delete the duplicate check from UnmarshalYaml then?

diff --git a/config/notifiers.go b/config/notifiers.go
index db86b1a2..2650db5f 100644
--- a/config/notifiers.go
+++ b/config/notifiers.go
@@ -503,11 +503,6 @@ func (c *WebhookConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
        if c.URL != nil && c.URLFile != "" {
                return fmt.Errorf("at most one of url & url_file must be configured")
        }
-       if c.URL != nil {
-               if c.URL.Scheme != "https" && c.URL.Scheme != "http" {
-                       return fmt.Errorf("scheme required for webhook url")
-               }
-       }
        return nil
 }
gotjosh commented 1 year ago

Ah, I get it now -- I think it's because SecretURL is actually an URL

https://github.com/prometheus/alertmanager/blob/353c0a130485371bfd75a0dd5c4f910701b12f02/config/config.go#L127

and the UnmarshalYAML of SecretURL uses the original unmarshalling function

https://github.com/prometheus/alertmanager/blob/353c0a130485371bfd75a0dd5c4f910701b12f02/config/config.go#L138-L151

which means, SecretURL actually does:

https://github.com/prometheus/alertmanager/blob/353c0a130485371bfd75a0dd5c4f910701b12f02/config/config.go#L91-L102

and ends up validating the URL on unmarshalling anyway.

I think the patch above makes sense and we should be safe to remove that check from UnmarshalYAML of WebhookConfig. The hard part is probably testing the CLI to cover both scenarios of with a config and with a fake remote Alertmanager.

gotjosh commented 1 year ago

@czenker, we've just merged #3509 and I'll be releasing 0.26.1 tomorrow. Apologies for the disruption.

czenker commented 1 year ago

@gotjosh Thanks for the quick fix. This is a quick reminder that v0.26.1 is not released yet.

rajatvig commented 10 months ago

Is 0.26.1 planned?

xbglowx commented 10 months ago

@czenker, we've just merged #3509 and I'll be releasing 0.26.1 tomorrow. Apologies for the disruption.

Any ETA on this?

KevinGimbel commented 10 months ago

What's missing to get this released? Any way we can support?

k0ste commented 10 months ago

@czenker, we've just merged #3509 and I'll be releasing 0.26.1 tomorrow. Apologies for the disruption.

Can you be more specific about what is meant by tomorrow?

bmgante commented 9 months ago

I am also having this error in grafana when checking alert manager 0.26 contact points. When can we expect this to be fixed?

error: scheme required for webhook url
k0ste commented 9 months ago

@SuperQ please take a look, seems something prevents for a bugfix release .1

timmow commented 7 months ago

I have also just hit this issue when upgrading alertmanager, and it has broken our automated test suite

TheMeier commented 6 months ago

Should be fixed now?