pyrra-dev / pyrra

Making SLOs with Prometheus manageable, accessible, and easy to use for everyone!
https://demo.pyrra.dev
Apache License 2.0
1.26k stars 113 forks source link

bool_gauge SLO's budget is burning continouosly #1231

Open mdarii opened 4 months ago

mdarii commented 4 months ago

I'm trying to define SLO to monitor the website availability. SLO relays on the prometheus blackbox-exporter metric: probe_success. Here is the SLO definition:

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  annotations:
    pyrra.dev/description: 'TODO: Define the runbook for this SLO'
    pyrra.dev/summary: Website availability is below 99.999% 
  labels:
    pyrra.dev/team: ops
  name: availability
spec:
  alerting:
    absent: true
    burnrates: true
    disabled: true
  description: Website should have 99.999% availability
  indicator:
    bool_gauge:
      grouping:
      - cluster
      metric: probe_success{instance="https://example.com"}
  target: "99.999"
  window: 1w

After creating of the SLO, the error budget is continuously decreasing, but there's no errors(all blackbox checks were succesfull)

image

Could be that there's an error in the logic how the error budget is calculated?