slok / sloth

🦥 Easy and simple Prometheus SLO (service level objectives) generator
https://sloth.dev
Apache License 2.0
2.09k stars 173 forks source link

Some queries not working with Thanos hosted data #364

Open poochwashere opened 2 years ago

poochwashere commented 2 years ago

Thank you for developing this really useful project.

We have our metrics migrate over to Thanos for long term storage after living in Prometheus for 24 hours. For some reason the queries for the remaining error budget are not getting properly calculated and returns a NaN. When I change the data source from Thanos to Prom things work as expected.

I can get values from some of the prom queries when I run them individually but when I run the entire expression it chokes.

These work ad-hoc against the Thanos datasource and returns the expected value. slo:sli_error:ratio_rate1h{sloth_service="mfplaid-api",sloth_slo="requests-availability"} slo:error_budget:ratio{sloth_service="mfplaid-api",sloth_slo="requests-availability"} *on() group_left() (24 * days_in_month())

But when I execute the entire expression it returns NaN.

1-(
  sum_over_time(
    (
       slo:sli_error:ratio_rate1h{sloth_service="mfplaid-api",sloth_slo="requests-availability"}
       * on() group_left() (
         month() == bool vector(8)
       )
    )[32d:1h]
  )
  / on(sloth_id)
  (
    slo:error_budget:ratio{sloth_service="mfplaid-api",sloth_slo="requests-availability"} *on() group_left() (24 * days_in_month())
  )
)

Any clues that may help me?

Thanks Again!

neitrinoweb commented 1 year ago

Hello! @poochwashere Have you been able to solve this problem?

alexvaque commented 8 months ago

How are you defining the PrometheusServiceLevel ? Do you have any example? Thanks