[BUG] RateLimit is not working by expecting.

ladisone commented 1 year ago

Prerequisites

[x] Put an X between the brackets on this line if you have done all of the following:
- Read about bug reporting in general: https://rspamd.com/doc/faq.html#how-to-report-bugs-found-in-rspamd
- Enabled relevant debugging logs: https://rspamd.com/doc/faq.html#how-to-debug-some-module-in-rspamd
- Checked the FAQs about Core files in case of fatal crash: https://rspamd.com/doc/faq.html#how-to-figure-out-why-rspamd-process-crashed
- Tried ASAN package and obtained the ASAN report (if possible): https://rspamd.com/doc/faq.html#asan-builds
- Checked that your issue isn't already filed: https://github.com/issues?utf8=%E2%9C%93&q=is%3Aissue+user%3Arspamd
- Checked that there is not already an experimental package or master branch

Describe the bug When I will reach the "burst" border in ratelimit.conf ratelimit doesn't set a rate, only block send e-mails then expire burst counter. The default "expiry" is set to 2 days, but when I put "expiry" to 1h, this setting is not accepting.

Steps to Reproduce

ratelimit.conf:

rates {
user = {
selector = 'user.lower';
bucket = [
{
  burst = 10;
  rate  = "8 / 1min";
},
{
  burst = 20;
  rate  = "10 / 10min";
},
{
  burst = 120;
  rate  = "100 / 1h";
}
]
}
};
info_symbol = "R_RATELIMIT_INFO";
expiry = 1h;

I send 119 messages

127.0.0.1:6379> HGETALL RLzni8u6qjhjak
1) "l"
2) "1682405057593"
3) "b"
4) "0"
5) "dr"
6) "10000"
7) "db"
8) "10000"
9) "p"
10) "119"

And e-mail client says, "Ratelimit user exceeded." Now I have to wait two days to expire the counter or delete RL* from Redis DB.

Expected behavior

When I send 10 messages, then I will continue, but only 8 messages per 1 minute.
When I send 20 messages, then I will continue, but only max 10 messages per 10 minutes.
When I send 120 messages, I will continue, but only 100 messages per 1 hour.

Versions

Rspamd daemon version 3.5
OS - AlmaLinux release 8.7 (Stone Smilodon)

sriccio commented 1 year ago

I'm seeing the same behavior on the Rspamd instances we updated to 3.5. Going back to 3.4.x and flushing the rate limit redis entries helped returning to normal operations.

Also seems to me this issue in mailcow is related https://github.com/mailcow/mailcow-dockerized/issues/5168

remkolodder commented 1 year ago

Same issue here via Mailcow. After reverting to rspamd 3.4 it works fine again (so far at least). @vstakhov can you please have a look at this? If you need info from me, please let me know (See the mailcow thread as well). Thank you for all the continued hard efforts that you make, it's appreciated!

vstakhov commented 1 year ago

I just do not understand the issue tbh. Is it related to the whitelisted_ip option? From the original issue description I can conclude that the pending (or p) key is not drained like the ordinary bucket value.

remkolodder commented 1 year ago

For what it's worth, this occurred on a mailcow account that does not have whitelist_ip selected, it appears that indeed a bucket is not drained, I saw a sudden increase around the 11th of may (when I upgraded the local install that included rspamd 3.5) and then gradually continued. There was no release of the ratelimit, it kept on going.

rates {
#    # Format: "1 / 1h" or "20 / 1m" etc. - global ratelimits are disabled by default
    to = "100 / 1s";
    to_ip = "100 / 1s";
    to_ip_from = "100 / 1s";
    bounce_to = "100 / 1h";
    bounce_to_ip = "7 / 1m";
}
whitelisted_rcpts = "postmaster,mailer-daemon";
max_rcpt = 25;
custom_keywords = "/etc/rspamd/lua/ratelimit.lua";
info_symbol = "RATELIMITED";

those are the settings that mailcow uses, nothing fancy.

where the lua file contains:

cat lua/ratelimit.lua 
local custom_keywords = {}

custom_keywords.mailcow = function(task)
  local rspamd_logger = require "rspamd_logger"
  local dyn_rl_symbol = task:get_symbol("DYN_RL")
  if dyn_rl_symbol then
    local rl_value = dyn_rl_symbol[1].options[1]
    local rl_object = dyn_rl_symbol[1].options[2]
    if rl_value and rl_object then
      rspamd_logger.infox(rspamd_config, "DYN_RL symbol has value %s for object %s, returning %s...", rl_value, rl_object, "rs_dynrl_" .. rl_object)
      return "rs_dynrl_" .. rl_object, rl_value
    end
  end
end

return custom_keywords

frederikbosch commented 1 year ago

I removed my comment regarding whitelisted_ip. I believe something did change when upgrading from 3.4 to 3.5 because I also see an increase in my logs.

vstakhov commented 1 year ago

Ok, I think the reason is that p is not cleared. The intention of the pending field was to count messages that are currently being processed. However, if you have short-circuit rules (and they are evil - I've told that many-many times), then p can be increased in the pre-filter but never decreased in the post-filter as post-filters are skipped.

vstakhov commented 1 year ago

Or no, this symbol has all guards against it: flags = 'explicit_disable,ignore_passthrough'

frederikbosch commented 1 year ago

I am still having issues. E-mails are being send from our webmail client, using an IP that I have included in the ip_whitelisted map.

The log.

2023-05-30 13:11:59 #9(normal) <d79c6d>; task; rspamd_task_write_log: id: <41bc017f249d2186cfdbbe934e5521ac@client.nl>, qid: <1q3z8p-0007ZJ-00>, ip: 172.16.20.123, user: training@client.nl, from: <training@client.nl>, (default: F (soft reject): [0.00/15.00] [RATELIMIT(0.00){incoming_ip_limit(RLg4m1d1d86msx3);},TAGGED_RCPT(0.00){}]), len: 2983, time: 6.322ms, dns req: 0, digest: <a50b2b63cf6d76a2433c197eda5a9a41>, rcpts: <...>, mime_rcpts: <>, forced: soft reject "Ratelimit "incoming_ip_limit" exceeded"; score=nan (set by ratelimit)

My ratelimit.conf:

rates {
  user = {
    bucket = [
      {
        burst = 20;
        rate = "100 / 10m";
      }
    ]
  }

  incoming_ip_limit {
    selector = "ip";
    whitelisted_ip = "/etc/rspamd/maps.d/ip-whitelist.map"
    bucket [
      {
        burst = 20;
        rate = "400 / 10m";
      }
    ]
  }
}

info_symbol = "RATELIMIT";

And my ip-whitelist.map:

app@rspamd:/$ cat /etc/rspamd/maps.d/ip-whitelist.map
46.21.123.12
2.58.123.12
172.16.20.0/24
250.0.0.0/8

And the entry from Redis.

127.0.0.1:6379> HGETALL RLg4m1d1d86msx3
 1) "l"
 2) "1685452319129"
 3) "b"
 4) "0"
 5) "dr"
 6) "50444"
 7) "db"
 8) "101221"
 9) "p"
10) "189"

vstakhov commented 1 year ago

You cannot define per rule whitelist maps, they are defined globally for this module. The main question is why p bucket is not clearing.

It is increased here: https://github.com/rspamd/rspamd/blob/master/lualib/redis_scripts/ratelimit_check.lua#L69 when a message is started to be scanned.

It is decreased here: https://github.com/rspamd/rspamd/blob/master/lualib/redis_scripts/ratelimit_update.lua#L78

So if this postfilter is not called, we are in real troubles. But this postfilter must be called in all cases: https://github.com/rspamd/rspamd/blob/master/src/plugins/lua/reputation.lua#L1332

vstakhov commented 1 year ago

So normally p must always be around 0.

ladisone commented 1 year ago

@vstakhov I read previous comments and am unsure if my question is relevant. I understood the problem is p bucket is not clearing. Is this issue still considered a bug?

benschhold commented 1 year ago

i also reverted back to old rspamd version because i had the described issue and also my prefilter didnt work anymore. I had a whitelist with prefilter to completly whitelist a IP, after the update ratelimit wasnt included in the prefilter whitelist anymore and i had to use the whitelist from the ratelimit module. Might have something todo with the not working ratelimit but not sure

vstakhov commented 1 year ago

Is this issue still considered a bug?

I'm not sure it is Rspamd bug, as all reports are likely from Mailcow users. I also see no way how p bucket could not be cleaned if ratelimit callbacks are called properly. That's the problem.

vstakhov commented 1 year ago

I had a whitelist with prefilter to completly whitelist a IP, after the update ratelimit wasnt included in the prefilter whitelist anymore and i had to use the whitelist from the ratelimit module.

I'm sorry, but I cannot parse this sentence.

benschhold commented 1 year ago

what i tried to say that prefilter as described here https://rspamd.com/doc/modules/multimap.html#pre-filter-maps is not whitelisting the ratelimit module anymore im not using mailcow but i also use a custom ratelimit lua, maybe thats what i have in common with mailcow

vstakhov commented 1 year ago

I'm sorry, but what do you mean by "whitelisting ratelimit by multimap"? If that's what I think about, it has never worked as you could expect. It might work merely by occasion, and it is not an issue. For disabling symbols, you can use many methods: settings, custom Lua code, conditions etc. Multimap is not a proper tool for this task.

benschhold commented 1 year ago

i understand your point and i dont disagree, would just point out that it worked perfectly since years until now i dont want to mixup issues if that has nothing to do with ratelimit issue as this might have something todo with mailcow and me using the custom_keywords feature with a custom lua script? want to mention that i also never heard issues with that before.

ladisone commented 1 year ago

I had a whitelist with prefilter to completly whitelist a IP, after the update ratelimit wasnt included in the prefilter whitelist anymore and i had to use the whitelist from the ratelimit module.

I'm sorry, but I cannot parse this sentence.

@vstakhov I don't use Mailcow. I use only Rspamd with Postfix in my configuration. I have only this problem, which I wrote here https://github.com/rspamd/rspamd/issues/4467#issue-1684388464 with this simple configuration.

frederikbosch commented 1 year ago

For what it is worth, my whitelist problem was indeed resolved by defining the whitelisted_ip config globally. I also checked some of the p values by fetching them from Redis, after the change, and they were indeed zero or close to zero. So my problem was something different than the issue others have here.

sriccio commented 1 year ago

Is this issue still considered a bug?

I'm not sure it is Rspamd bug, as all reports are likely from Mailcow users. I also see no way how p bucket could not be cleaned if ratelimit callbacks are called properly. That's the problem.

Hi,

I discovered this ratelimit issue after upgrading two standalone rspamd (not related to mailcow) from 3.4.x to 3.5.x. Many users were getting unusually ratelimited with 3.5, as if the buckets were getting filled but weren't leaking. Clearing the buckets in redis temporarily helped for a few hours, but downgrading to 3.4.1 definitively fixed the problem.

Looks to me that there was a change somewhere in 3.5.x which affected how ratelimit behave, at least with our config.

While googling around I found out that mailcow users were having the same kind of issues when their rspamd container was updated to 3.5.x too.

Nothing really special in our config, except a ratelimit whitelist based on authenticated user names.

# local.d/ratelimit.conf
#
  whitelisted_user  = "${LOCAL_CONFDIR}/custom/ratelimit_whitelisted_users.map";

  rates {
    # Selector based ratelimit
    some_limit = {
      selector = 'user.lower';
      # You can define more than one bucket, however, you need to use array syntax only
      bucket = [
      {
        burst = 60; # capacity of 50 messages in the bucket
        rate = "12 / 1min"; # leak 12 messages per minute (every 5s)
      }]
    }
    # Predefined ratelimit
    to = {
      bucket = {
        burst = 100;
        rate = 0.01666666666666666666; # leak 1 message per minute
      }
    }
    # or define it with selector
    other_limit_alt = {
      selector = 'rcpts:addr.take_n(5)';
      bucket = {
        burst = 100;
        rate = "1 / 1m"; # leak 1 message per minute
      }
    }
  }

Kind regards

vstakhov commented 1 year ago

Ok, I think I know the reason now: it is again about short-curcuit rules indeed. I have added one more workaround to really clean the pending bucket.

barianiluca commented 9 months ago

In my opinion still exist problem with ratelimit module. Basically what happen to us is as follows: our ratelimit local.d/ratelimit.conf:

rates {
    1000_smtp_mail_daily_limit_customerdomain_com = {
      # 1000 mail /24h for user of @customerdomain.com domain
      selector = 'user.lower.regexp("^[A-Za-z0-9._%+-]+@customerdomain\.com$")';
      bucket = [
      {
            burst = 1000;
            rate = "1000 / 24h";
      }]
    }
    smtp_mail_daily_limit = {
      # 300 mail /24h for others user authenticated users
      selector = 'user.lower';
      bucket = [
      {
            burst = 300;
            rate = "300 / 24h";
      }]
    }
    web_mail_daily_limit = {
      # 20 mail /24h for not authenticated user
      selector = 'digest(header("Subject");header("From"))';
      bucket = [
      {
            burst = 20;
            rate = "20 / 24h";
      }]
    }
}

This work flawless for a while (could be something like 1 or 2 days) and then suddently the mail of @customerdomain.com does not enter any longer on their selector 1000_smtp_mail_daily_limit_customerdomain_com and therefore goes into the smtp_mail_daily_limit

Checking on the log i see (debug module on for ratelimit) something like this:

2023-10-03 09:43:07 #1478139(normal) <423ed3>; ratelimit; ratelimit.lua:527: check limit 1000_smtp_mail_daily_limit_customerdomain_it:xxx@customerdomain.com -> RLqmi6erhopgobsnh8qh8ay94b (1000/0.011574074074074073)
2023-10-03 09:43:07 #1478139(normal) <423ed3>; ratelimit; ratelimit.lua:466: got reply for limit xxx@customerdomain.com (1000 / 0.011574074074074073); 1 burst, 1.01:1.02 dyn, 1 leaked
2023-10-03 09:43:07 #1478139(normal) <423ed3>; ratelimit; ratelimit.lua:606: updated limit 1000_smtp_mail_daily_limit_customerdomain_it:xxx@customerdomain.com -> RLqmi6erhopgobsnh8qh8ay94b (1000/0.011574074074074073), burst: 1, dyn_rate: 1.01, dyn_burst: 1.02
2023-10-03 09:52:09 #1477810(main) <dsrs53>; lua; ratelimit.lua:708: enabled ratelimit: 1000_smtp_mail_daily_limit_customerdomain_it [symbol: nil, 1000 msgs burst, 0.011574074074074 msgs/sec rate]
2023-10-03 10:29:47 #2825400(normal) <5c1a01>; ratelimit; ratelimit.lua:527: check limit smtp_mail_daily_limit:xxx@customerdomain.com -> RLqmi6erhopgobsnh8qh8ay94b (300/0.003472222222222222)
2023-10-03 10:29:47 #2825400(normal) <5c1a01>; ratelimit; ratelimit.lua:466: got reply for limit xxx@customerdomain.com (300 / 0.003472222222222222); 1 burst, 1.01:1.02 dyn, 1 leaked

where seems that suddenly the limit to check change suddently from the correct on, to the default one.

We use rspamd 3.6.2

fatalbanana commented 9 months ago

This work flawless for a while (could be something like 1 or 2 days) and then suddently the mail of @customerdomain.com does not enter any longer on their selector 1000_smtp_mail_daily_limit_customerdomain_com and therefore goes into the smtp_mail_daily_limit

Independent ratelimits are not evaluated in any particular order (not reliably so anyway), the selector for the catch-all limit should exclude things that are to be handled elsewhere.

That could be improved on but it's an unrelated concern to the matter reported in this issue.

barianiluca commented 9 months ago

This work flawless for a while (could be something like 1 or 2 days) and then suddently the mail of @customerdomain.com does not enter any longer on their selector 1000_smtp_mail_daily_limit_customerdomain_com and therefore goes into the smtp_mail_daily_limit

Independent ratelimits are not evaluated in any particular order (not reliably so anyway), the selector for the catch-all limit should exclude things that are to be handled elsewhere.

That could be improved on but it's an unrelated concern to the matter reported in this issue.

thank you. as a matter of coincidence (a wrong answer on another forum) and a casualty choice of our rules selection that guide me on thinking on top-down approach.

rspamd / rspamd