While this default seems fine for the silences, it seems a lot too high for the nflog. Indeed, the nflog should ideally be kept only for ~110% of x=max( group_wait, group_interval, repeat_interval). When having a large number of alerts and a low x, alertmanager un-necessarily uses a lot of memory, because the state is broadcasted perpetually.
We have a default of
120h
retention.While this default seems fine for the silences, it seems a lot too high for the nflog. Indeed, the nflog should ideally be kept only for ~110% of
x=max( group_wait, group_interval, repeat_interval)
. When having a large number of alerts and a lowx
, alertmanager un-necessarily uses a lot of memory, because the state is broadcasted perpetually.Here a heap of such a case: https://share.polarsignals.com/73d955e/
I see multiple ways forward: