Closed jonmz closed 3 months ago
I'm not too familiar with Javascript, but a basic idea could be: After doing the calculation, if a given timestamp is a recent one, don't simply ignore older values, but actively remove them from Redis. Roughly like this (again, I'm not too familiar with Javascript, so YMMV):
--- index.js.orig 2024-07-15 16:37:40.252588366 +0000
+++ index.js 2024-07-15 16:40:40.215693968 +0000
@@ -511,7 +511,10 @@
let connections_in_ttl_period = 0
for (const ts of Object.keys(tstamps)) {
- if (parseInt(ts, 10) < periodStartTs) continue // older than ttl
+ if (parseInt(ts, 10) < periodStartTs) { // older than ttl
+ this.db.hDel(`rate_conn:${key}`, ts);
+ continue
+ }
connections_in_ttl_period =
connections_in_ttl_period + parseInt(tstamps[ts], 10)
}
Good sleuthing and excellent description of the problem. Your solution looks just fine. May I look forward to a Pull Request?
Describe the bug
When enabling the
rate_conn
setting, the limit plugin creates/updates a Redis key namedrate_conn:<IP>
which has a hash as its value. The hash uses timestamps as keys, with a counter updated by HINCRBY.The TTLS of the
rate_conn:<IP>
key is being set/reset to the configured window of time times two on every connection, e.g. for a setting of60m
it's 7200, and every new connection from the same IP resets the TTL.The hash keys, however, have no TTLs on their own, which means, they only expire when the whole key
rate_conn:<IP>
expires.This leads to a situation where a client that connects regularly within the configured time window, the hash is only growing all the time. One very prominent and common case of a client that connects regularly is: Monitoring!
Expected behavior
I expected outdated timestamps within the
rate_conn:<IP>
Redis key to expire by some plugin logic, as Redis doesn't have a TTL per hash key (AFAIK), keeping the dataset current and small.Observed behavior
No expiration of hash keys is ever happening. The hash grows forever, as long as the same IP address connects again within 7200 seconds.
In our case, our monitoring does a short SMTP check every minute, each one generating a timestamp hash key under the according
rate_conn:<IP>
key. This is 60 new hash keys per hour, 1.400 a day, 43.200 a month, 511.000 per year. In our case, after about 2,5 years, we discovered about 1.300.000 timestamped hash keys for our monitoring IP in Redis.This explained why the Haraka processes over time used immense CPU resources and immense memory: Reading 1,3 million hash keys on every monitoring connection into memory, doing 1,3 million quick calculations if the timestamp lies within the configured
60m
and hence ignoring basically all of them except the most current 60 - that took a quite a toll on our servers. The moment we stopped monitoring our Haraka ports, cluster load basically halved and Haraka memory usage dropped by about 2 GB - per host.Steps To Reproduce
Assuming an already configured Redis, configure a limit in
config/limit.ini
, in our case:Regularly make SMTP connections from the same IP, e.g. every minute. In our case we set up a blackbox_exporter on localhost:9115 as we're using Prometheus for monitoring, and do a request every minute:
Any other means of SMTP should work as well, e.g. using Swaks.
Check the actual Redis data, replacing 111.222.333.444 with the actual IP you use for testing:
Expectation would be to find timestamps no older than the configured
60m
, so in this case, about 60. Reality is you'll find timestamps of any age.System Info
(It's an old CentOS 7 host which is about to be replaced soon, but I don't think the environment is relevant here anyway.)