oskoperator / osko

OpenSLO Kubernetes Operator
Apache License 2.0
15 stars 2 forks source link

Figure out how to handle severity label in alerting Prometheus Rule #103

Open fourstepper opened 1 month ago

fourstepper commented 1 month ago

For example, in automatically generated rules, we already create two different severities in one alerting PrometheusRule

How to decide on what those should be? Opsgenie expects P1 - P4, PagerDuty might expect something else

weyert commented 1 month ago

I think it would make sense to convert a semantic severity to the Opsgenie or PagerDuty representation of the severity. E.g. always define it in critical, high, low etc. You could then map critical to SEV-1 for PagerDuty and P1 for OpsGenie

fourstepper commented 1 month ago

That's an interesting idea, although I'm a bit worried about keeping OSKO up to date with all available providers and their severity levels.

To make it more clear for an outside perspective - we are trying to implement the multi-window, multi-burn-rate alerting here in an automatic fashion based on an existing SLO. In theory (and in our practice) having this as a config option would be good enough, but other usecases might differ.

Hy3n4 commented 2 days ago

I would use the defaults as critical, high-fast, high-slow, low, and no_slo. This is taken from the SRE book from the alerting chapter. They are using it to sort the SLOs, but we can also use it as a default severities.

Hy3n4 commented 2 days ago

Do we think we'll ever need to use more than one alerting tool at a time? 🤔

fourstepper commented 13 hours ago

The defaults are one thing, but how to configure these is a whole other thing that I think is the bigger issue here 🤔

Regarding your question, that's really hard to say. I would focus on supporting one at a time for now. If we can stay backwards compatible with supporting one, setting the various severities in the config, for example, we can always add an option of multiple ones down the line.

Hy3n4 commented 11 hours ago

something like this?

type AlertSeverities struct {
    Critical string
    HighFast string
    HighSlow string
    Low      string
    NoSlo    string
}

alertingTools := []AlertingTool{
    {
        Name: "opsgenie",
        SeverityMap: map[string]string{
            "critical":  "P1",
            "high-fast": "P2",
            "high-slow": "P3",
            "low":       "P4",
            "no-slo":    "P5",
        },
    },
    {
        Name: "pagerduty",
        SeverityMap: map[string]string{
            "critical":  "SEV-1",
            "high-fast": "SEV-2",
            "high-slow": "SEV-3",
            "low":       "SEV-4",
            "no-slo":    "SEV-5",
        },
    },
}
fourstepper commented 11 hours ago

Eh, I mean I guess we could do that :) Alternatively, we could just set this in the internal/config/config.go

Hy3n4 commented 11 hours ago

Yes, I copy-pasted it from that file :D

fourstepper commented 11 hours ago

Yeah, makes sense. Just not sure if we want to map these out directly.

Alternatively, we could have some "custom" type for those not using either of the ones we support (like opsgenie and pagerduty from the above example)

Hy3n4 commented 11 hours ago

we can allow configuring it through the env vars, but it would not cover the cases where there is a need for multiple alerting tools for one triggered alert.

fourstepper commented 11 hours ago

I wouldn't cover that for now, tbh