rspamd / rspamd.com

rspamd.com website.
https://rspamd.com
Creative Commons Attribution Share Alike 4.0 International
26 stars 127 forks source link

Please document a working regexp example #423

Open systemcrash opened 5 years ago

systemcrash commented 5 years ago

I checked the regexp module page, and could not make a working .conf file.

Specifically, I found this in the code:

reconf['MICROSOFT_SPAM'] = {
  -- https://technet.microsoft.com/en-us/library/dn205071(v=exchg.150).aspx
  re = 'X-Forefront-Antispam-Report=/SFV:SPM/H',
  score = 4.0,
  description = "Microsoft says the message is spam",
  group = 'upstream_spam_filters'
}

And wanted an expression like:

re = 'X-Forefront-Antispam-Report=/SFV:SPM/iH'

But the regexp page speaks only about regexp, and Internal functions, but not how to use them.

Which internal function do we call to say "yup, definitely spam, drop this shit"? Why perform all those binary checks (internal functions) if the regexp itself is the check we need?

Please show an example (and document it) that can go in local.d/regexp.conf - Ideally one that will immediately a) learn spam and reject or b) drop or discard

Today, with milter-regex, the syntax there is clear, e.g.:

discard
header /^X-Microsoft-Antispam$/i /.*BCL\:[1-9]*/i

discard
header /^X-Forefront-Antispam-Report$/i /.*SFV\:SPM.*/i
jmptbl commented 4 years ago

@systemcrash I needed to compile a regexp rule recently, and also struggled to figure out the regexp module. Eventually I got something working. Below is the content of my local.d/regexp.conf file, hope it helps.

"RE_SEXTORTION" = {
    re = '/your/{words} && /password/{words} && /buy/{words} && /bitcoin/{words}';
    score = 15.0;
}
systemcrash commented 4 years ago

What field were you filtering on and what was the typical content?

The module isn't the easiest to use, I have to admit...

On Wed, 22 Apr 2020 at 15:28, Aragon Gouveia notifications@github.com wrote:

@systemcrash https://github.com/systemcrash I needed to compile a regexp rule recently, and also struggled to figure out the regexp module. Eventually I got something working. Below is the content of my local.d/regexp.conf file, hope it helps.

"RE_SEXTORTION" = { re = '/your/{words} && /password/{words} && /buy/{words} && /bitcoin/{words}'; score = 15.0; }

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rspamd/rspamd.com/issues/423#issuecomment-617778969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6DUJGEHZD3OMBP4AQ2ILRN3WH5ANCNFSM4JKLW37Q .

jmptbl commented 4 years ago

It filters on the {words} type, which is a transformation on the message body documented as follows:

Unicode normalized (to NFKC) and lowercased words extracted from the text (excluding URLs), subject and From displayed name

The content was sextortion type emails that I was given as examples. They were sneakily encoded with strange UTF-8 character sequences, so {words} and the regexp patterns I gave seemed good enough given the size and type of the user base in question.