grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.12k stars 527 forks source link

Alertmanager error using custom function queryFromGeneratorURL #8914

Open fgouteroux opened 2 months ago

fgouteroux commented 2 months ago

Describe the bug

I'm using the alertmanager template custom functions grafanaExploreURL and queryFromGeneratorURL to generate the grafana explore link in mimir alertmanager receiver.

I got this error:

{
  "caller": "dispatch.go:353",
  "component": "dispatcher",
  "err": "os-related/pagerduty[0]: notify retry canceled due to unrecoverable error after 1 attempts: \"grafana_link\": failed to template \"{{ template \\\"grafana_link\\\" . }}\": template: grafana_link.tmpl:3:233: executing \"grafana_link\" at <queryFromGeneratorURL (index .Alerts 0).GeneratorURL>: error calling queryFromGeneratorURL: failed to URL decode the query: invalid URL escape \"% 2\"",
  "insight": "true",
  "level": "error",
  "msg": "Notify for alerts failed",
  "num_alerts": 1,
  "ts": "2024-08-02T13:12:10.412230982Z",
  "user": "fgx"
}

After some investigations, I found the root cause. If the alert expr contain the character % like (consul_raft_peers % 2) < 1, the custom function queryFromGeneratorURL failed at line: https://github.com/grafana/mimir/blob/main/pkg/alertmanager/alertmanager_template.go#L75-L78.

The QueryUnescape failed to decode the expr.

QueryUnescape does the inverse transformation of QueryEscape, converting each 3-byte encoded substring of the form "%AB" into the hex-decoded byte 0xAB. It returns an error if any % is not followed by two hexadecimal digits.

From prometheus ruler code the Generator alert field should be url encoded.

expr unescaped

(consul_raft_peers % 2) < 1

expr escaped

%28consul_raft_peers+%25+2%29+%3C+1

If the alert generatorURL field was really escaped, we should not have this issue.

This is an important issue as the alert notification could not be sent to the receiver.

To Reproduce

Steps to reproduce the behavior:

  1. Create an alerting rule with expr: consul_raft_peers % 2
  2. Create the alertmanager config using the custom template function: (queryFromGeneratorURL (index .Alerts 0).GeneratorURL)

Workaround

I found a dirty workaround to bypass custom functions grafanaExploreURL and queryFromGeneratorURL but it is really ugly and unreadable.

{{ define "grafana_link" -}}
        {{- if eq .CommonLabels.source_stack "Mimir" -}}
                https://telemetry.example.com/explore?left={{ (urlquery (printf "{\"datasource\":\"prometheus\",\"queries\":[{\"datasource\":{\"type\":\"prometheus\",\"uid\":\"prometheus\"},\"expr\":\"_expr_to_insert_\"}],\"range\":{\"from\":\"%d000\",\"to\":\"%d000\"}}" ((index .Alerts 0).StartsAt.Add -900000000000).Unix ((index .Alerts 0).StartsAt.Add 900000000000).Unix)) | reReplaceAll "_expr_to_insert_" ((index .Alerts 0).GeneratorURL | reReplaceAll ".*g0.expr=(.*)&g0.tab=1" "$1" | reReplaceAll "%22" "%5C%22") }}&orgId=2
        {{- else if eq .CommonLabels.source_stack "Loki" -}}
                https://telemetry.example.com{{ (index .Alerts 0).GeneratorURL | reReplaceAll "}" (urlquery (printf ",\"range\":{\"from\":\"%d000\",\"to\":\"%d000\"}}" ((index .Alerts 0).StartsAt.Add -900000000000).Unix ((index .Alerts 0).StartsAt.Add 900000000000).Unix)) | reReplaceAll "\"" "%22" }}&orgId=2
        {{- end -}}
{{- end -}}
danieleandreatta commented 2 weeks ago

the issue is that url.QueryUnescape() gets called twice, once directly in queryFromGeneratorURL, and once in url.Parse().

This means that expressions containing % and + characters will be rendered incorrectly