mailcow / mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕
https://mailcow.email
GNU General Public License v3.0
8.87k stars 1.17k forks source link

rspamd NEURAL module seem to learn SPAM as HAM upon greylisting and soft rejection #4477

Closed ValdikSS closed 2 years ago

ValdikSS commented 2 years ago

Prior to placing the issue, please check following: (fill out each checkbox with an X once done)

Summary

Since somewhere around 3 weeks ago I’ve been receiving much larger amount of SPAM in my inbox. For some reason, rspamd neural module gives -4 with NEURAL_HAM_LONG and -2 with NEURAL_HAM_SHORT rating to most of spam emails, making them into inbox folder. The confidence rate of NEURAL module is always -1.000.

By checking the logs I found that the first submission of spam emails receive proper spam rating (9+, SPAM) upon greylisting, but after being resubmitted by the server they get -4 by NEURAL_HAM_LONG and -2 by NEURAL_HAM_SHORT. I'm assuming that rspamd somehow learns these mails as HAM upon initial submission, which is should do not.

Logs

Fresh example from today. The incoming message comes from mail.ru, it's submitted first by 94.100.177.91 and rejected due to greylisting, not receiving anything from NEURAL_HAM_LONG/SHORT, then 7 seconds later gets submitted by 94.100.177.128, with greylist "too soon" rejection. After ~50 minutes it got resubmitted again by 94.100.177.128, and this time it receives -4 from NEURAL_HAM_LONG and -2 from NEURAL_HAM_SHORT.

rspamd log

Feb 18 12:35:51 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 12:35:51 #41(normal) <dd0d39>; task; rspamd_protocol_http_reply: regexp statistics: 0 pcre regexps scanned, 5 regexps matched, 3174 regexps total, 2976 regexps cached, 0B scanned using pcre, 12.62KiB scanned total
Feb 18 12:36:34 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 12:36:34 #41(normal) <8145c3>; lua; clamav.lua:119: clamav: failed to scan, maximum retransmits exceed
Feb 18 12:36:35 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 12:36:35 #41(normal) <8145c3>; task; rspamd_task_write_log: id: <E1nKzgg-0005ji-Ht.gushchinvp5ep-mail-ru@smtp31.i.mail.ru>, qid: <591CF347C3A>, ip: 94.100.177.91, from: <gushchinvp5ep@mail.ru>, (default: F (soft reject): [9.94/15.00] [RSPAMD_URIBL(4.50){rosstrah.ru:url;},FREEMAIL_REPLYTO_NEQ_FROM_DOM(3.00){},BAD_REP_POLICIES(2.00){},SUBJECT_ENDS_SPACES(0.50){},IP_REPUTATION_HAM(-0.40){asn: 47764(-0.40), country: RU(-0.01), ip: 94.100.177.91(0.00);},R_MIXED_CHARSET(0.25){subject;},MIME_HTML_ONLY(0.20){},MIME_BASE64_TEXT(0.10){},MIME_GOOD(-0.10){multipart/mixed;},RCVD_IN_DNSWL_LOW(-0.10){94.100.177.91:from;},MX_GOOD(-0.01){},XM_UA_NO_VERSION(0.01){},ARC_NA(0.00){},ARC_SIGNED(0.00){valdikss.org.ru:s=msrv:i=1;},ASN(0.00){asn:47764, ipnet:94.100.176.0/20, country:RU;},CLAM_VIRUS_FAIL(0.00){failed to scan and retransmits exceed;},DKIM_TRACE(0.00){mail.ru:+;},DMARC_POLICY_ALLOW(0.00){mail.ru;reject;},DWL_DNSWL_NONE(0.00){mail.ru:dkim;},FREEMAIL_ENVFROM(0.00){mail.ru;},FREEMAIL_FROM(0.00){mail.ru;},FREEMAIL_REPLYTO(0.00){bk.ru;},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},GREYLIST(0.00){greylisted;Fri, 18 Feb 2022 09:41:35 GMT;new record;},HAS_ATTACHMENT(0.00){},HAS_REPLYTO(0.00){to98spb@bk.ru;},HAS_X_PRIO_ONE(0.00){1;},MID_RHS_MATCH_FROMTLD(0.00){},MIME_TRACE(0.00){0:+;1:~;2:~;},RCPT_COUNT_ONE(0.00){1;},RCPT_MAILCOW_DOMAIN(0.00){valdikss.org.ru;},RCVD_COUNT_TWO(0.00){2;},RCVD_TLS_LAST(0.00){},RCVD_VIA_SMTP_AUTH(0.00){},REPLYTO_DN_EQ_FROM_DN(0.00){},REPLYTO_DOM_NEQ_FROM_DOM(0.00){},RWL_MAILSPIKE_POSSIBLE(0.00){94.100.177.91:from;},R_DKIM_ALLOW(0.00){mail.ru:s=mail4;},R_SPF_ALLOW(0.00){+ip4:94.100.176.0/20;},TO_DN_NONE(0.00){},TO_MATCH_ENVRCPT_ALL(0.00){}]), len: 607861, time: 1220.008ms, dns req: 38, digest: <6ac1d3bc76ab584c0fc530192faabbdd>, rcpts: <iam@valdikss.org.ru>, mime_rcpts: <iam@valdikss.org.ru>, forced: soft reject "Greylisted, please try again later"; score=nan (set by greylist)
Feb 18 12:36:35 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 12:36:35 #41(normal) <8145c3>; task; rspamd_protocol_http_reply: regexp statistics: 0 pcre regexps scanned, 12 regexps matched, 3174 regexps total, 2975 regexps cached, 0B scanned using pcre, 11.31KiB scanned total
Feb 18 12:36:40 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 12:36:40 #41(normal) <9699cd>; task; rspamd_task_write_log: id: <E1nKzgg-0005ji-Ht.gushchinvp5ep-mail-ru@smtp31.i.mail.ru>, qid: <4597D347C3C>, ip: 94.100.177.128, from: <gushchinvp5ep@mail.ru>, (default: F (soft reject): [0.00/15.00] [ASN(0.00){asn:47764, ipnet:94.100.176.0/20, country:RU;},GREYLIST(0.00){greylisted;Fri, 18 Feb 2022 09:41:35 GMT;too early;}]), len: 609304, time: 200.630ms, dns req: 1, digest: <6ac1d3bc76ab584c0fc530192faabbdd>, rcpts: <iam@valdikss.org.ru>, mime_rcpts: <iam@valdikss.org.ru>, forced: soft reject "Greylisted, please try again later"; score=nan (set by greylist)
Feb 18 12:36:40 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 12:36:40 #41(normal) <9699cd>; task; rspamd_protocol_http_reply: regexp statistics: 0 pcre regexps scanned, 0 regexps matched, 3174 regexps total, 0 regexps cached, 0B scanned using pcre, 0B scanned total
Feb 18 12:42:20 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 12:42:20 #41(normal) <q37eyd>; map; rspamd_re_map_cache_cleanup_old: cannot unlink stale cache file for http://nullnull.org/bad-subject-regex.txt (/var/lib/rspamd//bc8f4a8a3942330cc20edcbfdf794476d81f49e84d9340a28b6bf08cf63dab7b.hsmc): No such file or directory
Feb 18 13:21:42 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 13:21:42 #40(controller) <c6onet>; map; http_map_error: error reading https://bazaar.abuse.ch/export/txt/md5/recent/(151.101.114.49:443): connection with http server terminated incorrectly: ssl connect error: syscall fail: Connection timed out
Feb 18 13:23:07 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 13:23:07 #41(normal) <9871e9>; lua; clamav.lua:119: clamav: failed to scan, maximum retransmits exceed
Feb 18 13:23:08 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 13:23:08 #41(normal) <9871e9>; task; rspamd_task_write_log: id: <E1nKzgg-0005ji-Ht.gushchinvp5ep-mail-ru@smtp31.i.mail.ru>, qid: <686DD347D5F>, ip: 94.100.177.128, from: <gushchinvp5ep@mail.ru>, (default: F (no action): [4.04/15.00] [RSPAMD_URIBL(4.50){rosstrah.ru:url;},NEURAL_HAM_LONG(-4.00){-1.000;},FREEMAIL_REPLYTO_NEQ_FROM_DOM(3.00){},BAD_REP_POLICIES(2.00){},NEURAL_HAM_SHORT(-2.00){-1.000;},SUBJECT_ENDS_SPACES(0.50){},IP_REPUTATION_HAM(-0.40){asn: 47764(-0.40), country: RU(-0.01), ip: 94.100.177.128(0.00);},R_MIXED_CHARSET(0.25){subject;},MIME_HTML_ONLY(0.20){},MIME_BASE64_TEXT(0.10){},MIME_GOOD(-0.10){multipart/mixed;},MX_GOOD(-0.01){},XM_UA_NO_VERSION(0.01){},ARC_NA(0.00){},ARC_SIGNED(0.00){valdikss.org.ru:s=msrv:i=1;},ASN(0.00){asn:47764, ipnet:94.100.176.0/20, country:RU;},BCC(0.00){},CLAM_VIRUS_FAIL(0.00){failed to scan and retransmits exceed;},DKIM_TRACE(0.00){mail.ru:+;},DMARC_POLICY_ALLOW(0.00){mail.ru;reject;},DWL_DNSWL_NONE(0.00){mail.ru:dkim;},FREEMAIL_ENVFROM(0.00){mail.ru;},FREEMAIL_FROM(0.00){mail.ru;},FREEMAIL_REPLYTO(0.00){bk.ru;},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},GREYLIST(0.00){pass;body;},HAS_ATTACHMENT(0.00){},HAS_REPLYTO(0.00){to98spb@bk.ru;},HAS_X_PRIO_ONE(0.00){1;},MID_RHS_MATCH_FROMTLD(0.00){},MIME_TRACE(0.00){0:+;1:~;2:~;},RCPT_COUNT_ONE(0.00){1;},RCPT_MAILCOW_DOMAIN(0.00){valdikss.org.ru;},RCVD_COUNT_THREE(0.00){3;},RCVD_IN_DNSWL_NONE(0.00){94.100.177.128:from;},RCVD_TLS_LAST(0.00){},RCVD_VIA_SMTP_AUTH(0.00){},REPLYTO_DN_EQ_FROM_DN(0.00){},REPLYTO_DOM_NEQ_FROM_DOM(0.00){},R_DKIM_ALLOW(0.00){mail.ru:s=mail3;mail.ru:s=mail4;},R_SPF_ALLOW(0.00){+ip4:94.100.176.0/20;},TO_DN_NONE(0.00){},TO_MATCH_ENVRCPT_ALL(0.00){}]), len: 609304, time: 1108.636ms, dns req: 42, digest: <6ac1d3bc76ab584c0fc530192faabbdd>, rcpts: <iam@valdikss.org.ru>, mime_rcpts: <iam@valdikss.org.ru>
Feb 18 13:23:08 mail.valdk.tel 89fbcc6175fb[140]: 2022-02-18 13:23:08 #41(normal) <9871e9>; task; rspamd_protocol_http_reply: regexp statistics: 0 pcre regexps scanned, 12 regexps matched, 3174 regexps total, 2975 regexps cached, 0B scanned using pcre, 11.69KiB scanned total

postfix log

Feb 18 12:36:28 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:28 d42a349df99a postfix/postscreen[7976]: CONNECT from [94.100.177.91]:38088 to [172.22.1.253]:25
Feb 18 12:36:28 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:28 d42a349df99a whitelist_forwardinghosts: Look up 94.100.177.91 on whitelist, result 200 DUNNO
Feb 18 12:36:28 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:28 d42a349df99a postfix/dnsblog[8006]: addr 94.100.177.91 listed by domain wl.mailspike.net as 127.0.0.17
Feb 18 12:36:28 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:28 d42a349df99a postfix/dnsblog[7986]: addr 94.100.177.91 listed by domain list.dnswl.org as 127.0.5.1
Feb 18 12:36:31 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:31 d42a349df99a postfix/postscreen[7976]: PASS NEW [94.100.177.91]:38088
Feb 18 12:36:32 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:32 d42a349df99a postfix/smtpd[7997]: connect from smtp31.i.mail.ru[94.100.177.91]
Feb 18 12:36:33 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:33 d42a349df99a postfix/smtpd[7997]: Anonymous TLS connection established from smtp31.i.mail.ru[94.100.177.91]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)
Feb 18 12:36:33 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:33 d42a349df99a postfix/smtpd[7997]: 591CF347C3A: client=smtp31.i.mail.ru[94.100.177.91]
Feb 18 12:36:33 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:33 d42a349df99a postfix/cleanup[8000]: 591CF347C3A: message-id=<E1nKzgg-0005ji-Ht.gushchinvp5ep-mail-ru@smtp31.i.mail.ru>
Feb 18 12:36:35 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:35 d42a349df99a postfix/cleanup[8000]: 591CF347C3A: milter-reject: END-OF-MESSAGE from smtp31.i.mail.ru[94.100.177.91]: 4.7.1 Greylisted, please try again later; from=<gushchinvp5ep@mail.ru> to=<iam@valdikss.org.ru> proto=ESMTP helo=<smtp31.i.mail.ru>
Feb 18 12:36:35 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:35 d42a349df99a postfix/smtpd[7997]: disconnect from smtp31.i.mail.ru[94.100.177.91] ehlo=2 starttls=1 mail=1 rcpt=1 data=0/1 quit=1 commands=6/7
Feb 18 12:36:35 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:35 d42a349df99a postfix/postscreen[7976]: CONNECT from [94.100.177.128]:47318 to [172.22.1.253]:25
Feb 18 12:36:35 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:35 d42a349df99a whitelist_forwardinghosts: Look up 94.100.177.128 on whitelist, result 200 DUNNO
Feb 18 12:36:36 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:36 d42a349df99a postfix/dnsblog[7989]: addr 94.100.177.128 listed by domain list.dnswl.org as 127.0.5.0
Feb 18 12:36:38 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:38 d42a349df99a postfix/postscreen[7976]: PASS NEW [94.100.177.128]:47318
Feb 18 12:36:38 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:38 d42a349df99a postfix/smtpd[7997]: connect from fallback16.mail.ru[94.100.177.128]
Feb 18 12:36:39 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:39 d42a349df99a postfix/smtpd[7997]: Anonymous TLS connection established from fallback16.mail.ru[94.100.177.128]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)
Feb 18 12:36:39 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:39 d42a349df99a postfix/smtpd[7997]: 4597D347C3C: client=fallback16.mail.ru[94.100.177.128]
Feb 18 12:36:39 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:39 d42a349df99a postfix/cleanup[8000]: 4597D347C3C: message-id=<E1nKzgg-0005ji-Ht.gushchinvp5ep-mail-ru@smtp31.i.mail.ru>
Feb 18 12:36:40 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:40 d42a349df99a postfix/cleanup[8000]: 4597D347C3C: milter-reject: END-OF-MESSAGE from fallback16.mail.ru[94.100.177.128]: 4.7.1 Greylisted, please try again later; from=<gushchinvp5ep@mail.ru> to=<iam@valdikss.org.ru> proto=ESMTP helo=<fallback16.mail.ru>
Feb 18 12:36:40 mail.valdk.tel d42a349df99a[140]: Feb 18 12:36:40 d42a349df99a postfix/smtpd[7997]: disconnect from fallback16.mail.ru[94.100.177.128] ehlo=2 starttls=1 mail=1 rcpt=1 data=0/1 quit=1 commands=6/7
Feb 18 13:23:04 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:04 d42a349df99a postfix/postscreen[8232]: CONNECT from [94.100.177.128]:51980 to [172.22.1.253]:25
Feb 18 13:23:04 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:04 d42a349df99a whitelist_forwardinghosts: Look up 94.100.177.128 on whitelist, result 200 DUNNO
Feb 18 13:23:04 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:04 d42a349df99a postfix/dnsblog[8246]: addr 94.100.177.128 listed by domain list.dnswl.org as 127.0.5.0
Feb 18 13:23:05 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:05 d42a349df99a postfix/postscreen[8232]: PASS OLD [94.100.177.128]:51980
Feb 18 13:23:05 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:05 d42a349df99a postfix/smtpd[8251]: connect from fallback16.mail.ru[94.100.177.128]
Feb 18 13:23:06 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:06 d42a349df99a postfix/smtpd[8251]: Anonymous TLS connection established from fallback16.mail.ru[94.100.177.128]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)
Feb 18 13:23:06 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:06 d42a349df99a postfix/smtpd[8251]: 686DD347D5F: client=fallback16.mail.ru[94.100.177.128]
Feb 18 13:23:06 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:06 d42a349df99a postfix/cleanup[8255]: 686DD347D5F: message-id=<E1nKzgg-0005ji-Ht.gushchinvp5ep-mail-ru@smtp31.i.mail.ru>
Feb 18 13:23:08 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:08 d42a349df99a postfix/qmgr[378]: 686DD347D5F: from=<gushchinvp5ep@mail.ru>, size=609687, nrcpt=1 (queue active)
Feb 18 13:23:08 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:08 d42a349df99a postfix/smtpd[8251]: disconnect from fallback16.mail.ru[94.100.177.128] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7
Feb 18 13:23:08 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:08 d42a349df99a postfix/lmtp[8256]: 686DD347D5F: to=<iam@valdikss.org.ru>, relay=dovecot[172.22.1.250]:24, delay=2.1, delays=2/0.02/0.03/0.08, dsn=2.0.0, status=sent (250 2.0.0 <iam@valdikss.org.ru> brbnFoxzD2IyzwAArKbnxw Saved)
Feb 18 13:23:08 mail.valdk.tel d42a349df99a[140]: Feb 18 13:23:08 d42a349df99a postfix/qmgr[378]: 686DD347D5F: removed

Дк вcе рeгиoны и kатeгoрии - Кoндpатьева Aнна Mирoнoвнa gushchinvp5ep@mail.ru 2022-02-18 1636.zip

Reproduction

All my spam emails have NEURAL_* negative ratings with -1.000 confidence. I haven't tried to reproduce sending such emails by myself.

System information

Question Answer
My operating system Ubuntu 20.04, mailcow in LXD container with Ubuntu 20.04 as well
Is Apparmor, SELinux or similar active? None
Virtualization technlogy (KVM, VMware, Xen, etc - LXC and OpenVZ are not supported LXD
Server/VM specifications (Memory, CPU Cores) 16 GB RAM
Docker Version (docker version) 20.10.12 community
Docker-Compose Version (docker-compose version) 1.29.2
Reverse proxy (custom solution) None

None relevant (only commented net.core.somaxconn=4096 and removed ipv6nat-mailcow due to native IPv6 support in docker, which happened automatically on mailcow update)

data/conf/postfix/main.cf
+myhostname = mail.valdk.tel
+mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128 [fe80::]/10 172.22.1.0/24 [fd4d:6169:6c63:6f77::]/64
+smtps_smtpd_tls_mandatory_protocols = !SSLv2, !SSLv3
Ry3nlNaToR commented 2 years ago

I've also noticed an increase spam recently after checking rspamd most of the extra spam getting delivered to inbox due to -6 from neural.

Am thinking for quick temporary fix I might just disable neural not tested it yet I think adding enabled = false to data/conf/rspamd/local.d/neural.conf should disable it, the thing is am not sure if that would cause any issues don't know if neural is used by any other rspamd checks.

I don't think disabling will cause me lose any good mail as most is not within 6 of being sent to spam.

ValdikSS commented 2 years ago

Am thinking for quick temporary fix I might just disable neural not tested it yet I think adding enabled = false to data/conf/rspamd/local.d/neural.conf should disable it

I did exactly this, worked fine.

Ry3nlNaToR commented 2 years ago

Thanks will just do that, do you need to delete all the lines in the file or just adding that to the top will be fine.

ValdikSS commented 2 years ago

Add it to the top, then restart the container.

kaechele commented 2 years ago

The neural plugin is not producing good results for me either. It seems that after some time the data trained gets it stuck in a state where it marks any and all incoming and outgoing (authenticated) email on my server as

NEURAL_HAM_LONG (-4) [-1.000]
NEURAL_SPAM_SHORT (2.5) [1.000]

I have disabled it as well while I investigate.

milkmaker commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stathis commented 2 years ago

I have the exact same issue. Hopefully will be fixed at some point, but will keep neural off for now.

kaechele commented 2 years ago

I think this is "solved" by 7e26a2ab988eb85193e5be28e278e5021361b06f

milkmaker commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.