timescale / promscale

[DEPRECATED] Promscale is a unified metric and trace observability backend for Prometheus, Jaeger and OpenTelemetry built on PostgreSQL and TimescaleDB.
https://www.timescale.com/promscale
Apache License 2.0
1.33k stars 169 forks source link

split QueryErrorHigh alert by handler and use different threshold for query_range #1802

Closed paulfantom closed 1 year ago

paulfantom commented 1 year ago

Description

By splitting calculations based on handler we can have finer thresholds for different queries. Right now we are aggregating all queries (including /read endpoint) which in turn skew results.

As a first step in fine-tuning thresholds this PR adds separate threshold (10%) for range queries. This in turn should improve UX when using promscale with grafana by reducing false-positives from 503 cancelations.

Merge requirements

Please take into account the following non-code changes that you may need to make with your PR:

niksajakovljevic commented 1 year ago

@paulfantom I just made a PR https://github.com/timescale/promscale/pull/1806 to update query metric with additional label. You maybe want to wait for that to get merged and update your PR respectively?

niksajakovljevic commented 1 year ago

FYI https://github.com/timescale/promscale/pull/1806 has been merged. Maybe we close this PR and open new one that uses additional label.

paulfantom commented 1 year ago

IMHO https://github.com/timescale/promscale/pull/1806 is orthogonal to this PR and both should be merged.

Plus #1806 is not fixing the issue as it is not changing anything in the alert expression. I added it in this PR.