In some cases, once a certain problem occurs, it can occur thousands of times in quick succession, and flood the log files with identical (or at least similar) messages. The "solution" of lowering these messages log level to "trace" messages is not a real solution, and can hide a very real problem.
The better solution to the shower of messages isn't to hide them but to rate-limit them. Let's add a simple log rate-limiting object, which when when used, allows only a given number of messages to be printed per second, and further silenced messages are counted.
For example:
WARN [shard 0] component - Some message
WARN [shard 0] component - Some message
WARN [shard 0] component - Some message
WARN [shard 0] component - Some message
WARN [shard 0] component - Some message
WARN [shard 0] component - Some message (and 12345 similar messages skipped)
The per-second limit should be shared by all messages, from any shard, which use the same rate-limiting object.
6 months ago, I already sent a patch (to the scylladb mailing list, where the logger lived at that time) with an implementation. It should be rebased and Avi's suggestions implemented:
In some cases, once a certain problem occurs, it can occur thousands of times in quick succession, and flood the log files with identical (or at least similar) messages. The "solution" of lowering these messages log level to "trace" messages is not a real solution, and can hide a very real problem.
The better solution to the shower of messages isn't to hide them but to rate-limit them. Let's add a simple log rate-limiting object, which when when used, allows only a given number of messages to be printed per second, and further silenced messages are counted.
For example: WARN [shard 0] component - Some message WARN [shard 0] component - Some message WARN [shard 0] component - Some message WARN [shard 0] component - Some message WARN [shard 0] component - Some message WARN [shard 0] component - Some message (and 12345 similar messages skipped)
The per-second limit should be shared by all messages, from any shard, which use the same rate-limiting object.
6 months ago, I already sent a patch (to the scylladb mailing list, where the logger lived at that time) with an implementation. It should be rebased and Avi's suggestions implemented:
https://groups.google.com/d/msg/scylladb-dev/M8k0BQXSFRk/VgDRO_FKEQAJ https://groups.google.com/d/msg/scylladb-dev/M8k0BQXSFRk/22XhlPFKEQAJ