Closed sinkingpoint closed 2 years ago
As for how this works for concurrency (and not destroying processing time by locking things), maybe something like having a leaky bucket per thread, with the ratelimit evenly shared between them would work
Thanks again for contributing, closing the ticket as it is already merged.
Description of the problem
As a syslog-ng operator, I run many disparate services on a number of machines, all feeding their logs into syslog-ng via various sources (unix socket, journald). What this can lead to is a noisy neighbor problem where one service produces many more logs than others, which overloads the pipeline and leads to buffering and dropping of affected services.
The existing
throttle
option is too coarse to be useful - it affects the entire log stream, and only causes buffering rather than allowing control of logs that breach a rate limitProposed solution
What I would like to see is something like the Logstash Throttle Filter which allows, among other things:
Something like:
which would have a counter for each different
_SYSTEMD_UNIT
and perform some action on logs that come in with a _SYSTEMD_UNIT that occurs more that 500 times a secondAdditional context
I'm aware that patterndb gets us sort of? there. I guess what I'm asking is pretty similar to https://lists.balabit.hu/pipermail/syslog-ng/2011-November/017850.html (from 10 years ago) but there doesn't seem to be a conclusion there - in patterndb the correlation key (what is used to key the ratelimit) is the tuple of (pid, program, host) which might work for some usecases, but really arbitrary keys would be more useful (e.g. given a forked program with workers with different pids)