Open ladisone opened 1 year ago
I'm seeing the same behavior on the Rspamd instances we updated to 3.5. Going back to 3.4.x and flushing the rate limit redis entries helped returning to normal operations.
Also seems to me this issue in mailcow is related https://github.com/mailcow/mailcow-dockerized/issues/5168
Same issue here via Mailcow. After reverting to rspamd 3.4 it works fine again (so far at least). @vstakhov can you please have a look at this? If you need info from me, please let me know (See the mailcow thread as well). Thank you for all the continued hard efforts that you make, it's appreciated!
I just do not understand the issue tbh. Is it related to the whitelisted_ip
option? From the original issue description I can conclude that the pending
(or p
) key is not drained like the ordinary bucket value.
For what it's worth, this occurred on a mailcow account that does not have whitelist_ip selected, it appears that indeed a bucket is not drained, I saw a sudden increase around the 11th of may (when I upgraded the local install that included rspamd 3.5) and then gradually continued. There was no release of the ratelimit, it kept on going.
rates {
# # Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default
to = "100 / 1s";
to_ip = "100 / 1s";
to_ip_from = "100 / 1s";
bounce_to = "100 / 1h";
bounce_to_ip = "7 / 1m";
}
whitelisted_rcpts = "postmaster,mailer-daemon";
max_rcpt = 25;
custom_keywords = "/etc/rspamd/lua/ratelimit.lua";
info_symbol = "RATELIMITED";
those are the settings that mailcow uses, nothing fancy.
where the lua file contains:
cat lua/ratelimit.lua
local custom_keywords = {}
custom_keywords.mailcow = function(task)
local rspamd_logger = require "rspamd_logger"
local dyn_rl_symbol = task:get_symbol("DYN_RL")
if dyn_rl_symbol then
local rl_value = dyn_rl_symbol[1].options[1]
local rl_object = dyn_rl_symbol[1].options[2]
if rl_value and rl_object then
rspamd_logger.infox(rspamd_config, "DYN_RL symbol has value %s for object %s, returning %s...", rl_value, rl_object, "rs_dynrl_" .. rl_object)
return "rs_dynrl_" .. rl_object, rl_value
end
end
end
return custom_keywords
I removed my comment regarding whitelisted_ip
. I believe something did change when upgrading from 3.4 to 3.5 because I also see an increase in my logs.
Ok, I think the reason is that p
is not cleared. The intention of the pending
field was to count messages that are currently being processed. However, if you have short-circuit rules (and they are evil - I've told that many-many times), then p
can be increased in the pre-filter but never decreased in the post-filter as post-filters are skipped.
Or no, this symbol has all guards against it: flags = 'explicit_disable,ignore_passthrough'
I am still having issues. E-mails are being send from our webmail client, using an IP that I have included in the ip_whitelisted
map.
The log.
2023-05-30 13:11:59 #9(normal) <d79c6d>; task; rspamd_task_write_log: id: <41bc017f249d2186cfdbbe934e5521ac@client.nl>, qid: <1q3z8p-0007ZJ-00>, ip: 172.16.20.123, user: training@client.nl, from: <training@client.nl>, (default: F (soft reject): [0.00/15.00] [RATELIMIT(0.00){incoming_ip_limit(RLg4m1d1d86msx3);},TAGGED_RCPT(0.00){}]), len: 2983, time: 6.322ms, dns req: 0, digest: <a50b2b63cf6d76a2433c197eda5a9a41>, rcpts: <...>, mime_rcpts: <>, forced: soft reject "Ratelimit "incoming_ip_limit" exceeded"; score=nan (set by ratelimit)
My ratelimit.conf:
rates {
user = {
bucket = [
{
burst = 20;
rate = "100 / 10m";
}
]
}
incoming_ip_limit {
selector = "ip";
whitelisted_ip = "/etc/rspamd/maps.d/ip-whitelist.map"
bucket [
{
burst = 20;
rate = "400 / 10m";
}
]
}
}
info_symbol = "RATELIMIT";
And my ip-whitelist.map:
app@rspamd:/$ cat /etc/rspamd/maps.d/ip-whitelist.map
46.21.123.12
2.58.123.12
172.16.20.0/24
250.0.0.0/8
And the entry from Redis.
127.0.0.1:6379> HGETALL RLg4m1d1d86msx3
1) "l"
2) "1685452319129"
3) "b"
4) "0"
5) "dr"
6) "50444"
7) "db"
8) "101221"
9) "p"
10) "189"
You cannot define per rule whitelist maps, they are defined globally for this module. The main question is why p
bucket is not clearing.
It is increased here: https://github.com/rspamd/rspamd/blob/master/lualib/redis_scripts/ratelimit_check.lua#L69 when a message is started to be scanned.
It is decreased here: https://github.com/rspamd/rspamd/blob/master/lualib/redis_scripts/ratelimit_update.lua#L78
So if this postfilter is not called, we are in real troubles. But this postfilter must be called in all cases: https://github.com/rspamd/rspamd/blob/master/src/plugins/lua/reputation.lua#L1332
So normally p
must always be around 0.
@vstakhov I read previous comments and am unsure if my question is relevant. I understood the problem is p
bucket is not clearing. Is this issue still considered a bug?
i also reverted back to old rspamd version because i had the described issue and also my prefilter didnt work anymore. I had a whitelist with prefilter to completly whitelist a IP, after the update ratelimit wasnt included in the prefilter whitelist anymore and i had to use the whitelist from the ratelimit module. Might have something todo with the not working ratelimit but not sure
Is this issue still considered a bug?
I'm not sure it is Rspamd bug, as all reports are likely from Mailcow users. I also see no way how p
bucket could not be cleaned if ratelimit callbacks are called properly. That's the problem.
I had a whitelist with prefilter to completly whitelist a IP, after the update ratelimit wasnt included in the prefilter whitelist anymore and i had to use the whitelist from the ratelimit module.
I'm sorry, but I cannot parse this sentence.
what i tried to say that prefilter as described here https://rspamd.com/doc/modules/multimap.html#pre-filter-maps is not whitelisting the ratelimit module anymore im not using mailcow but i also use a custom ratelimit lua, maybe thats what i have in common with mailcow
I'm sorry, but what do you mean by "whitelisting ratelimit by multimap"? If that's what I think about, it has never worked as you could expect. It might work merely by occasion, and it is not an issue. For disabling symbols, you can use many methods: settings, custom Lua code, conditions etc. Multimap is not a proper tool for this task.
i understand your point and i dont disagree, would just point out that it worked perfectly since years until now i dont want to mixup issues if that has nothing to do with ratelimit issue as this might have something todo with mailcow and me using the custom_keywords feature with a custom lua script? want to mention that i also never heard issues with that before.
I had a whitelist with prefilter to completly whitelist a IP, after the update ratelimit wasnt included in the prefilter whitelist anymore and i had to use the whitelist from the ratelimit module.
I'm sorry, but I cannot parse this sentence.
@vstakhov I don't use Mailcow. I use only Rspamd with Postfix in my configuration. I have only this problem, which I wrote here https://github.com/rspamd/rspamd/issues/4467#issue-1684388464 with this simple configuration.
For what it is worth, my whitelist problem was indeed resolved by defining the whitelisted_ip
config globally. I also checked some of the p
values by fetching them from Redis, after the change, and they were indeed zero or close to zero. So my problem was something different than the issue others have here.
Is this issue still considered a bug?
I'm not sure it is Rspamd bug, as all reports are likely from Mailcow users. I also see no way how
p
bucket could not be cleaned if ratelimit callbacks are called properly. That's the problem.
Hi,
I discovered this ratelimit issue after upgrading two standalone rspamd (not related to mailcow) from 3.4.x to 3.5.x. Many users were getting unusually ratelimited with 3.5, as if the buckets were getting filled but weren't leaking. Clearing the buckets in redis temporarily helped for a few hours, but downgrading to 3.4.1 definitively fixed the problem.
Looks to me that there was a change somewhere in 3.5.x which affected how ratelimit behave, at least with our config.
While googling around I found out that mailcow users were having the same kind of issues when their rspamd container was updated to 3.5.x too.
Nothing really special in our config, except a ratelimit whitelist based on authenticated user names.
# local.d/ratelimit.conf
#
whitelisted_user = "${LOCAL_CONFDIR}/custom/ratelimit_whitelisted_users.map";
rates {
# Selector based ratelimit
some_limit = {
selector = 'user.lower';
# You can define more than one bucket, however, you need to use array syntax only
bucket = [
{
burst = 60; # capacity of 50 messages in the bucket
rate = "12 / 1min"; # leak 12 messages per minute (every 5s)
}]
}
# Predefined ratelimit
to = {
bucket = {
burst = 100;
rate = 0.01666666666666666666; # leak 1 message per minute
}
}
# or define it with selector
other_limit_alt = {
selector = 'rcpts:addr.take_n(5)';
bucket = {
burst = 100;
rate = "1 / 1m"; # leak 1 message per minute
}
}
}
Kind regards
Ok, I think I know the reason now: it is again about short-curcuit rules indeed. I have added one more workaround to really clean the pending bucket.
In my opinion still exist problem with ratelimit module. Basically what happen to us is as follows: our ratelimit local.d/ratelimit.conf:
rates {
1000_smtp_mail_daily_limit_customerdomain_com = {
# 1000 mail /24h for user of @customerdomain.com domain
selector = 'user.lower.regexp("^[A-Za-z0-9._%+-]+@customerdomain\.com$")';
bucket = [
{
burst = 1000;
rate = "1000 / 24h";
}]
}
smtp_mail_daily_limit = {
# 300 mail /24h for others user authenticated users
selector = 'user.lower';
bucket = [
{
burst = 300;
rate = "300 / 24h";
}]
}
web_mail_daily_limit = {
# 20 mail /24h for not authenticated user
selector = 'digest(header("Subject");header("From"))';
bucket = [
{
burst = 20;
rate = "20 / 24h";
}]
}
}
This work flawless for a while (could be something like 1 or 2 days) and then suddently the mail of @customerdomain.com does not enter any longer on their selector 1000_smtp_mail_daily_limit_customerdomain_com
and therefore goes into the smtp_mail_daily_limit
Checking on the log i see (debug module on for ratelimit) something like this:
2023-10-03 09:43:07 #1478139(normal) <423ed3>; ratelimit; ratelimit.lua:527: check limit 1000_smtp_mail_daily_limit_customerdomain_it:xxx@customerdomain.com -> RLqmi6erhopgobsnh8qh8ay94b (1000/0.011574074074074073)
2023-10-03 09:43:07 #1478139(normal) <423ed3>; ratelimit; ratelimit.lua:466: got reply for limit xxx@customerdomain.com (1000 / 0.011574074074074073); 1 burst, 1.01:1.02 dyn, 1 leaked
2023-10-03 09:43:07 #1478139(normal) <423ed3>; ratelimit; ratelimit.lua:606: updated limit 1000_smtp_mail_daily_limit_customerdomain_it:xxx@customerdomain.com -> RLqmi6erhopgobsnh8qh8ay94b (1000/0.011574074074074073), burst: 1, dyn_rate: 1.01, dyn_burst: 1.02
2023-10-03 09:52:09 #1477810(main) <dsrs53>; lua; ratelimit.lua:708: enabled ratelimit: 1000_smtp_mail_daily_limit_customerdomain_it [symbol: nil, 1000 msgs burst, 0.011574074074074 msgs/sec rate]
2023-10-03 10:29:47 #2825400(normal) <5c1a01>; ratelimit; ratelimit.lua:527: check limit smtp_mail_daily_limit:xxx@customerdomain.com -> RLqmi6erhopgobsnh8qh8ay94b (300/0.003472222222222222)
2023-10-03 10:29:47 #2825400(normal) <5c1a01>; ratelimit; ratelimit.lua:466: got reply for limit xxx@customerdomain.com (300 / 0.003472222222222222); 1 burst, 1.01:1.02 dyn, 1 leaked
where seems that suddenly the limit to check change suddently from the correct on, to the default one.
We use rspamd 3.6.2
This work flawless for a while (could be something like 1 or 2 days) and then suddently the mail of @customerdomain.com does not enter any longer on their selector
1000_smtp_mail_daily_limit_customerdomain_com
and therefore goes into thesmtp_mail_daily_limit
Independent ratelimits are not evaluated in any particular order (not reliably so anyway), the selector for the catch-all limit should exclude things that are to be handled elsewhere.
That could be improved on but it's an unrelated concern to the matter reported in this issue.
This work flawless for a while (could be something like 1 or 2 days) and then suddently the mail of @customerdomain.com does not enter any longer on their selector
1000_smtp_mail_daily_limit_customerdomain_com
and therefore goes into thesmtp_mail_daily_limit
Independent ratelimits are not evaluated in any particular order (not reliably so anyway), the selector for the catch-all limit should exclude things that are to be handled elsewhere.
That could be improved on but it's an unrelated concern to the matter reported in this issue.
thank you. as a matter of coincidence (a wrong answer on another forum) and a casualty choice of our rules selection that guide me on thinking on top-down approach.
Prerequisites
Describe the bug When I will reach the "burst" border in ratelimit.conf ratelimit doesn't set a rate, only block send e-mails then expire burst counter. The default "expiry" is set to 2 days, but when I put "expiry" to 1h, this setting is not accepting.
Steps to Reproduce
And e-mail client says, "Ratelimit user exceeded." Now I have to wait two days to expire the counter or delete RL* from Redis DB.
Expected behavior
Versions