prometheus / alertmanager

Prometheus Alertmanager
https://prometheus.io
Apache License 2.0
6.7k stars 2.17k forks source link

Asynchronously send alerts #1285

Open starsdeep opened 6 years ago

starsdeep commented 6 years ago

@brancz

cc @brian-brazil

We currently send alerts at every eval interval, and we sent alerts synchronously, see https://github.com/prometheus/alertmanager/blob/master/dispatch/dispatch.go#L371-L373, so when the reader read alerts from channel, it will not continue to read next alert until it successfully send the alert to all kinds of clients. And as we all know sending alert message to clients involve networking, which is slow. So, consumer can not catch up with producer, thus the channel will saturate.

Then, I propose to send alerts asynchronously:

// current synchronous code at https://github.com/prometheus/alertmanager/blob/master/dispatch/dispatch.go#L371-L373
ag.flush(func(alerts ...*types.Alert) bool {
    return nf(ctx, alerts...)
})

// asynchronous code I propose:
go func() {
    ag.flush(func(alerts ...*types.Alert) bool {
        return nf(ctx, alerts...)
    })
}()

With this simple optimization, the runtime.mach_semaphore_signal time can decrease from 33.3% to 16.6%

profile for current synchronous code image

profile for asynchronous code I proposed: image

starsdeep commented 6 years ago

ref https://github.com/prometheus/alertmanager/issues/1201

brian-brazil commented 6 years ago

One thing to watch for is that currently we will be sending at most one notification per group, if this is made asynchronous this exasperate overload on the receiver. I think we should maintain the property that at most one notification attempt is ongoing for a group at once.