Open Nuckal777 opened 2 months ago
I implemented the memoization in #3933, which also includes a benchmark.
Hi! š I thought GET /alerts
reads the data in the marker, rather than re-calculate inhibitions, so unless I've misunderstood the code, I'm surprised that GET /alerts
is slow because of inhibitor.Mutes
? Could you share a pprof perhaps?
Hey, alertFilter
calls setAlertStatus
, which is set via Update
here. The anonymous function for setAlertStatus
calculates inhibitions and silences.
Ah. So weird ā I don't understand why Alertmanager needs to re-calculate inhibitions and silences for GET requests.
Anyway, thank you for your contribution. I'll make sure to take a closer look later today or tomorrow.
Friendly reminder, that I would still appreciate some feedback. Likely this got lost in between other things.
Context
Requesting
/alerts
from my organizations alertmanager returns a reply in 4-6 seconds. The alertmanager has consistentlyGiven this scale, I was a bit surprised that fetching the alerts takes multiple seconds. Taking a trace via pprof revealed that at least 4 seconds are spent calculating inhibited alerts. Furthermore, a few hundred milliseconds are spent serializing the json response, which is expected due to the size of ~20 MiB in our case. The structure of my organizations inhibition rules can lead to ~10k alerts matching a target matcher and ~1k alerts matching the source matcher. Currently the benchmarks only consider cases with a single alert satisfying a target matcher.
Calling
/alerts
results basically in the following algorithm being executed at the moment:Mutes
here):Memoize source cache target match evaluations
The
Prevent two-sided match
step evaluatesr.TargetMatchers.Matches(a.Labels)
, wherea
is from the cached source alerts. It's invoked over and over again for each alert that is a valid target (potentially evaluating regular expressions), but it is not dependent on a specific target alert. Assuming I did not miss a side effect, the result ofr.TargetMatchers.Matches(a.Labels)
is only dependent on a specific inhibition rule and that rule's source alert cache. By memoizing that expression, a performance gain can be achieved when many alerts are valid targets.Constructing indices
Checking for label equality between the set of valid source alerts and valid target alerts runs in quadratic time. By maintaining an index per label, which maps that label's values to an "index into a slice of all alerts", alerts that have a label with a specific value can found by a map lookup. For multiple labels that need to be checked for equality, multiple slices of indices can be intersected in linear time. This would be a larger change though.