aggregate: Expire splay should be configurable

gpiucco commented 3 years ago

If you have high every/expire values, the aggregator will wait too long before producing metrics.

This is because the random splay added is based on the bucket size.

For example, this could take up to 600 seconds before producing metrics:

aggregate ^foo\..+\.bar$
  every 300 seconds
  expire after 301 seconds
    compute sum write to foo.bar
  send to main
  stop
  ;

In most setups, a few seconds of splay should already be enough to avoid the "thundering herd of expirations" problem. A simple solution would be to make the value configurable.

grobian commented 3 years ago

I can see your point, 600 seems too long with this config indeed

grobian commented 7 months ago

how about we just limit the splay to a few seconds instead of anywhere on its interval?

grobian / carbon-c-relay

aggregate: Expire splay should be configurable #435