pyrra-dev / pyrra

Making SLOs with Prometheus manageable, accessible, and easy to use for everyone!
https://demo.pyrra.dev
Apache License 2.0
1.24k stars 112 forks source link

No data and generic rules not created #1217

Open idrikay opened 3 months ago

idrikay commented 3 months ago

I have tried to deploy Pyrra with both manifests and helm chart. Both ways fail to create generic rules. I also get no data in requests or errors.

Screenshot 2024-07-17 at 12 43 58
genericRules:
  enabled: true
Screenshot 2024-07-17 at 12 23 46 Screenshot 2024-07-17 at 12 42 17 Screenshot 2024-07-17 at 12 38 02
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: coredns-response-errors
  namespace: monitoring
spec:
  description: ""
  indicator:
    ratio:
      errors:
        metric: coredns_dns_responses_total{job="coredns",rcode="SERVFAIL"}
      total:
        metric: coredns_dns_responses_total{job="coredns"}
  target: "99.99"
  window: 2w
vidomas commented 2 weeks ago

Solved same issue by adding "release: prometheus-community" label to my ServiceLevelObjective.

sebastiangaiser commented 2 weeks ago

Having a similar problem that in some clusters metrics for each slo resource getting created, in others not. For me this is independent from the label. I'm also using genericRules.enabled: true from the Helm chart.

vidomas commented 2 weeks ago

In my case prometheus is provisioned by operator (kube-prometheus-stack helm chart) and Prometheus CRD has rule selector based on labels

spec:
  ruleSelector:
    matchLabels:
      release: prometheus-community
sebastiangaiser commented 2 weeks ago

@vidomas can you check which metrics get produced by Pyrra? Using the matchLabels makes totally sense for your deployment of kube-prometheus-stack in order to pick-up the generated PrometheusRules. But the original issue is about that no metrics getting produced for a/all ServiceLevelObjectives.

sebastiangaiser commented 2 weeks ago

I found my error when looking trough the code: https://github.com/pyrra-dev/pyrra/blob/1e0a1ed35837f6acfa37b29a922cffd92d0bf685/slo/rules.go#L1389 bool_gauge and grouping is not supposed to work.