pyrra-dev / pyrra

Making SLOs with Prometheus manageable, accessible, and easy to use for everyone!
https://demo.pyrra.dev
Apache License 2.0
1.16k stars 101 forks source link

Latency SLOs - Duplicated "p100" percentile in duration graph (UI) #1197

Closed svenmueller closed 3 days ago

svenmueller commented 2 weeks ago

The duration graph for latency SLOs shows a duplicated "p100" percentile label.

Version 0.7.6

Bildschirmfoto 2024-06-17 um 16 49 54

Configuration

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: xyz
  namespace: monitoring
spec:
  alerting: {}
  description: 99.9% of xyz requests in the past 28 days are faster than 100
    ms.
  indicator:
    latency:
      grouping:
      - destination_service_name
      success:
        metric: istio_request_duration_milliseconds_bucket{job="kubernetes-pods",
          le="100"}
      total:
        metric: istio_request_duration_milliseconds_count{job="kubernetes-pods"}
  target: "99.9"
  window: 28d
status:
  type: PrometheusRule
metalmatze commented 4 days ago

Interesting find! It seems like there are some floating point rounding errors in place which cause both 0.9990000000000001 and 0.999 to be added to the list of percentiles.

Screenshot 2024-06-28 at 19 51 44
metalmatze commented 3 days ago

This is now fixed and https://github.com/pyrra-dev/pyrra/releases/tag/v0.7.7 released with it.