pyrra-dev / pyrra

Making SLOs with Prometheus manageable, accessible, and easy to use for everyone!
https://demo.pyrra.dev
Apache License 2.0
1.24k stars 112 forks source link

0s burnrate generated #1028

Open bck01215 opened 10 months ago

bck01215 commented 10 months ago

When creating a tight SLO on a shorter window it appears a 0s burn rate gets created causing errors

pyrra rules:

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: email-service-calendar
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: "99.999"
  window: "12h"
  indicator:
    ratio:
      errors:
        metric: http_responses_total{host="email-service-mylu2.okd.liberty.edu", path="/microsoft/calendar/", response=~"5.."}
      total:
        metric: http_responses_total{host="email-service-mylu2.okd.liberty.edu", path="/microsoft/calendar/"}

Prometheus logs:

ts=2024-01-12T13:55:51.589Z caller=manager.go:1049 level=error component="rule manager" msg="loading groups failed" err="/etc/prometheus/pyrra/prometheus-http.yaml: 23:11: group \"email-service-calendar\", rule 1, \"http_responses:burnrate0s\": could not parse expression: 1:119: parse error: duration must be greater than 0"

pyrra version: 7.2

metalmatze commented 9 months ago

Interesting. I never anticipated people actually want SLO windows this small. Usually at least a couple of days.

Are you sure you want an SLO in your case for the alerting?

Even if you want such a small window you would also have to scrape your metrics super fast. Like scrape every second instead of the usual 15s and more.

It would be great to learn more about your use case.

bck01215 commented 9 months ago

Interesting. I never anticipated people actually want SLO windows this small. Usually at least a couple of days.

Are you sure you want an SLO in your case for the alerting?

We were using new metrics that had just started being generated as a PoC. Long term 2w or more was fine, but we were using it to test pyrra

metalmatze commented 9 months ago

In that case, whether it's 2w or 12h shouldn't matter. Let's make sure we at least have 1s. Whether that's more helpful is debatable; it's definitely less broken.