Closed shadowwaking closed 4 years ago
@fichtner is there a newer version of the port in the tree?
It was updated to 2.1 from 1.9.4, there are small breaking .. I have to check tomorrow
Hm, I can't reproduce here. Is it only amazon/ebay? Do you know a better way to reproduce? Maybe you see more logs from postfix?
:( No, the error is not just on mails from amazon or ebay. I can see the IO error on many mails. Just amazon and ebay are the one that have a problem with the timeout and are sending the mails often.
The error comes very, very often but not on every mail. This could be the thing, with round robin behaviour. Sometimes it uses the IPv4 localhost first (that works) and sometimes the IPv6 (that fails).
The error set on failure is always the same:
Bad case:
2019-11-23 21:03:44 #21813(rspamd_proxy) <377a74>; proxy; proxy_accept_socket: accepted milter connection from 127.0.0.1 port 18820
2019-11-23 21:03:45 #21813(rspamd_proxy) <377a74>; milter; rspamd_milter_process_command: got connection from xx.xx.xx.xx:27271
2019-11-23 21:05:00 #21813(rspamd_proxy) <377a74>; proxy; proxy_backend_master_error_handler: abnormally closing connection from backend: ::1, error: IO write error: Broken pipe, retries left: 5
2019-11-23 21:05:00 #21813(rspamd_proxy) <377a74>; proxy; proxy_backend_master_error_handler: retry connection to: 127.0.0.1 retries left: 4
2019-11-23 21:05:00 #31636(normal)
Good case:
2019-11-23 16:05:03 #87961(rspamd_proxy)
with OPNsense 17.7.6
it always looks so:
2019-11-20 10:33:50 #70227(rspamd_proxy) <33a1e8>; proxy; proxy_milter_finish_handler: finished milter connection 2019-11-20 10:34:09 #70227(rspamd_proxy) <098576>; proxy; proxy_accept_socket: accepted milter connection from 127.0.0.1 port 25061 2019-11-20 10:34:09 #70227(rspamd_proxy) <098576>; milter; rspamd_milter_process_command: got connection from xx.xx.xx.xx:25614 2019-11-20 10:34:10 #7700(normal) <030806>; task; accept_socket: accepted connection from 127.0.0.1 port 9764, task ptr: 000001FEC4DBCF00
I already deleted the 17.7.7 snapshot, so i dont get the postfix logs, anymore. But i looked in and cant see anything interesting there.
Can you open an issue at rspamd repo with the Error linking here and the information that you updated from 1.9.4 to 2.1
Hi there,
it looks like that opnsense 19.7.7 sometimes resolves "localhost" to ipv4 and sometimes to ipv6. I have set the option "Prefer to use IPv4 even if IPv6 is available" but is it possible that this is broken or changed in 19.7.7 ?
Thank you.
The best way to check if this change was introduced by opnsense or rspamd is to upgrade to 19.7.7 and only downgrade rspamd to the state of 19.7.6. If the error persits, it's some change in opnsense, if it's gone, then something within rspamd. Vial CLI after update:
opnsense-revert -r 19.7.6 rspamd
Ah, perfect. Thank you for this information. I will try this.
Hi there,
I upgraded OPNsense to 19.7.7 (rspamd 2.1) and collected logs. In the first 1 hour after the upgrade everything runs fine. No IO write errors. But after 1 hour, i get the known errors. (abnormally closing connection from backend: ::1, error: IO write error: Broken pipe) and the delays. So i downgraded rspamd to 1.9.4. That was yesterday evening. And since there, no more errors. So it is very likely that rspamd is the problematic component. But the guys in the rpamd tree think, that is a ipv6 problem of opnsense. What indeed sounds also logical to me -> sometimes localhost resolves to ipv4 and sometimes to ipv6. But then, why cant rspamd handle both protocols correctly ?
Whatever, i think in this case, no one will find a solution, at the moment. I will stay at 19.7.7 with rspamd 1.9.4 and wait for the next updates. Eventually will it be fixed in a future update.
Thank you very much for your help.
Can you post the output of:
sockstat -6 | grep rspamd
please ...
sure...
19.7.7 with rspamd 1.9 .# sockstat -6 | grep rspamd rspamd rspamd-1.9 14942 58 tcp6 ::1:4397 ::1:6379 rspamd rspamd-1.9 14942 59 tcp6 ::1:4129 ::1:6379 rspamd rspamd-1.9 11803 56 tcp6 ::1:47447 ::1:6379 rspamd rspamd-1.9 11803 57 tcp6 ::1:42254 ::1:6379 rspamd rspamd-1.9 99182 41 tcp6 ::1:17602 ::1:6379 rspamd rspamd-1.9 99182 42 tcp6 ::1:13396 ::1:6379
19.7.7 with rspamd 2.1 .# sockstat -6 | grep rspamd rspamd rspamd-2.1 33182 50 tcp6 ::1:8882 ::1:6379 rspamd rspamd-2.1 36635 48 tcp6 ::1:61024 ::1:6379 rspamd rspamd-2.1 36635 49 tcp6 ::1:57916 ::1:6379 rspamd rspamd-2.1 36635 50 tcp6 ::1:23628 ::1:6379 rspamd rspamd-2.1 7899 44 tcp6 ::1:12203 ::1:6379
Strange ... essential services are not running on v6 compared to mine:
root@OPNsense:~ # sockstat -6 | grep rspamd
rspamd rspamd-2.1 27592 11 tcp6 ::1:11332 *:*
rspamd rspamd-2.1 27592 15 tcp6 ::1:11334 *:*
rspamd rspamd-2.1 27592 19 tcp6 ::1:11333 *:*
rspamd rspamd-2.1 99697 4 tcp6 ::1:10174 ::1:6379
rspamd rspamd-2.1 99697 11 tcp6 ::1:11332 *:*
rspamd rspamd-2.1 99697 15 tcp6 ::1:11334 *:*
rspamd rspamd-2.1 99697 19 tcp6 ::1:11333 *:*
rspamd rspamd-2.1 57219 4 tcp6 ::1:26987 ::1:6379
rspamd rspamd-2.1 57219 11 tcp6 ::1:11332 *:*
rspamd rspamd-2.1 57219 15 tcp6 ::1:11334 *:*
rspamd rspamd-2.1 57219 19 tcp6 ::1:11333 *:*
rspamd rspamd-2.1 54599 11 tcp6 ::1:11332 *:*
rspamd rspamd-2.1 54599 15 tcp6 ::1:11334 *:*
rspamd rspamd-2.1 54599 34 tcp6 ::1:10106 ::1:6379
rspamd rspamd-2.1 54599 35 tcp6 ::1:21795 ::1:6379
rspamd rspamd-2.1 54599 36 tcp6 ::1:38055 ::1:6379
rspamd rspamd-2.1 54599 37 tcp6 ::1:55401 ::1:6379
rspamd rspamd-2.1 11222 11 tcp6 ::1:11332 *:*
rspamd rspamd-2.1 4107 11 tcp6 ::1:11332 *:*
rspamd rspamd-2.1 4107 15 tcp6 ::1:11334 *:*
rspamd rspamd-2.1 4107 19 tcp6 ::1:11333 *:*
ok, that would explain something. But why is 1.9 working ?
For testing proposes i: deactivated System - Settings - General - " Prefer to use IPv4 even if IPv6 is available " and activated Firewall - Settings - Advanced - " Allow IPv6" restarted rspamd But nothing changes.
That looks also not so good to me:
~ # sockstat -6
? ? ? ? tcp6 ::1:15755 ::1:6379 ? ? ? ? tcp6 ::1:62847 ::1:6379 ? ? ? ? tcp6 ::1:50730 ::1:6379 ? ? ? ? tcp6 ::1:62984 ::1:6379 ? ? ? ? tcp6 ::1:21690 ::1:6379 ? ? ? ? tcp6 ::1:47535 ::1:6379 ? ? ? ? tcp6 ::1:33395 ::1:6379 ? ? ? ? tcp6 ::1:49346 ::1:6379 ? ? ? ? tcp6 ::1:18833 ::1:6379
6379 is just the redis db. I have to check why your instance doesnt bind to v6
Ok, Thank you. Please let me know, if i can do anything...
rspamd bind to localhost
and postfix connects to localhost
with a force of version 6. I'll add a dropdown to postfix antispam tab to choose the version (default to 6) so you can change it.
Perfect. Thank you very much.
I am still seeing this:
2020-08-13 09:16:34 #24765(rspamd_proxy) <3ef7fc>; proxy; proxy_backend_master_error_handler: retry connection to: ::1 retries left: 2
2020-08-13 09:17:49 #24765(rspamd_proxy) <3ef7fc>; proxy; proxy_backend_master_error_handler: abnormally closing connection from backend: [::1]:11333, error: IO write error: Broken pipe, retries left: 1
2020-08-13 09:17:49 #24765(rspamd_proxy) <3ef7fc>; proxy; proxy_backend_master_error_handler: retry connection to: 127.0.0.1 retries left: 1
This does happen only after some runtime. This only seems to happen to mime encoded mails. Plain seems to be unaffected.
I tried switching to milter protocol v4 but then rspamd complains about obsolete protocol 4....
Complains or does not work?
Mails have several minutes delay, sometimes do not go through.
Qualifies as does not work for me, even more so as these mails contain OTPs for people to log on interactively.
+1 here . We need rspamd for DKIM (because there is to opendkim in OPNsense).
You +1 and second part is misleading? Doesnt it work in general or so you wish dkim?
Oh, I meant we just use rspamd for DKIM signature outbound (we don't use it inbound). I +1 fhloston ... Maybe the bug itself is related to dkim crypto (i.e. rspamd works for those who don'T use dkim?)
Hi guys,
after days of troubleshooting i found out the the issue is related to DNS Resolution. (so DKIM is a good point) Ive fixed it by putting in my DNS Server in "/usr/local/etc/rspamd/options.inc". Sure, this is not upgrade persistent but i can live with that.
Bye
serial-bug
@serial-bug We will try that too, using an external DNS like 1.1.1.1 ?
If you use a local DNS Resolver like unbound or dnscrypt on your OPNsense, you can point directly to this. (eg. 127.0.0.1:53 or 127.0.0.1:5353). But a external DNS server schould work too.
Can you post your complete edit to the file? Then I can add it to GUI
Sure. In the "Default Settings" Section ive added my local DNS (nameserver = ["xxxxx:xx"];).
# Default settings
dns_max_requests = 64;
max_lua_urls = 1024;
max_urls = 10240;
dns {
nameserver = ["127.0.0.1:5353"];
timeout = 1s;
sockets = 16;
retransmits = 5;
}
I added the local unbound to the rspamd config: nameserver = ["127.0.0.1:53"];
I then validated in rspamadm configdump
that it is active.
Sadly, this makes no difference, still 1:15 delay for the OTP mail:
2020-08-15 12:20:02 #9772(rspamd_proxy) <c28c18>; proxy; proxy_accept_socket: accepted milter connection from 127.0.0.1 port 60086
2020-08-15 12:20:02 #9772(rspamd_proxy) <c28c18>; milter; rspamd_milter_process_command: got connection from x.x.x.x:50383
2020-08-15 12:21:17 #9772(rspamd_proxy) <c28c18>; proxy; proxy_backend_master_error_handler: abnormally closing connection from backend: [::1]:11333, error: IO write error: Broken pipe, retries left: 4
2020-08-15 12:21:17 #9772(rspamd_proxy) <c28c18>; proxy; proxy_backend_master_error_handler: retry connection to: 127.0.0.1 retries left: 4
`
Ehm, there would be a difference between pointing to 127.0.0.1:x and "use default" (which would probably end up in /etc/resolv) ?
@fhloston Is :53 a process listening oder a NAT rule pointing to the real dns?
Also using 1.1.1.1 makes no difference.
Since this mail relay is only used to relay these OTP messages I have whitelisted the IP of the sending machine in all other checks. Rspamd is only used to add the DKIM signature to the header.
2020-08-15 12:30:20 #9772(rspamd_proxy) <4a5eb1>; proxy; proxy_accept_socket: accepted milter connection from 127.0.0.1 port 30306
2020-08-15 12:30:20 #9772(rspamd_proxy) <4a5eb1>; milter; rspamd_milter_process_command: got connection from x.x.x.x:50387
2020-08-15 12:31:35 #9772(rspamd_proxy) <4a5eb1>; proxy; proxy_backend_master_error_handler: abnormally closing connection from backend: [::1]:11333, error: IO write error: Broken pipe, retries left: 4
2020-08-15 12:31:35 #9772(rspamd_proxy) <4a5eb1>; proxy; proxy_backend_master_error_handler: retry connection to: 127.0.0.1 retries left: 4
2020-08-15 12:31:35 #40530(normal) <c6db7e>; task; rspamd_worker_body_handler: accepted connection from 127.0.0.1 port 29175, task ptr: 000005D79E430CA0
2020-08-15 12:31:35 #40530(normal) <c6db7e>; task; rspamd_message_parse: loaded message; id: <undef>; queue-id: <2E4E959E91>; size: 331; checksum: <34beba9b8e38095cff23f14e783d044a>
2020-08-15 12:31:35 #40530(normal) <c6db7e>; lua; greylist.lua:203: skip greylisting for local networks and/or authorized users
2020-08-15 12:31:35 #40530(normal) <c6db7e>; lua; spf.lua:185: skip SPF checks for local networks and authorized users
2020-08-15 12:31:35 #40530(normal) <c6db7e>; task; dkim_symbol_callback: skip DKIM checks for local networks and authorized users
2020-08-15 12:31:35 #40530(normal) <c6db7e>; lua; dmarc.lua:596: skip DMARC checks as either SPF or DKIM were not checked
2020-08-15 12:31:35 #40530(normal) <c6db7e>; lua; once_received.lua:98: Skipping once_received for authenticated user or local network
2020-08-15 12:31:35 #40530(normal) <c6db7e>; task; rspamd_redis_connected: skip obtaining bayes tokens for BAYES_SPAM of classifier bayes: not enough learns 0; 200 required
2020-08-15 12:31:35 #40530(normal) <c6db7e>; task; rspamd_redis_connected: skip obtaining bayes tokens for BAYES_HAM of classifier bayes: not enough learns 0; 200 required
2020-08-15 12:31:35 #40530(normal) <c6db7e>; task; rspamd_stat_classifiers_process: skip statistics as SPAM class is missing
2020-08-15 12:31:35 #40530(normal) <c6db7e>; lua; greylist.lua:317: Score too low - skip greylisting
2020-08-15 12:31:35 #40530(normal) <c6db7e>; lua; neural.lua:478: cannot learn ham ANN default:default; redis_key: rn_default_default_1htmxf7z_0: sampled out with probability 1 (1 vectors stored)
2020-08-15 12:31:35 #40530(normal) <c6db7e>; task; rspamd_task_write_log: id: <undef>, qid: <2E4E959E91>, ip: x.x.x.x, user: smtp_auth@domain, from: <noreply@domain>, (default: F (no action): [-47.60/15.00] [WHITELIST_SENDER_DOMAIN(-50.00){domain;},MISSING_MID(2.50){},MIME_GOOD(-0.10){text/plain;},ARC_NA(0.00){},ASN(0.00){asn:12345, ipnet:1.2.3.4, country:DE;},DKIM_SIGNED(0.00){},FROM_EQ_ENVFROM(0.00){},FROM_HAS_DN(0.00){},MIME_TRACE(0.00){0:+;},RCPT_COUNT_ONE(0.00){1;},RCVD_COUNT_ZERO(0.00){0;},TO_DN_NONE(0.00){},TO_MATCH_ENVRCPT_ALL(0.00){}]), len: 331, time: 10.650ms, dns req: 1, digest: <34beba9b8e38095cff23f14e783d044a>, rcpts: <user@otherdomain>, mime_rcpts: <user@otherdomain>
2020-08-15 12:31:35 #40530(normal) <c6db7e>; task; rspamd_protocol_http_reply: regexp statistics: 0 pcre regexps scanned, 0 regexps matched, 184 regexps total, 35 regexps cached, 0B scanned using pcre, 459B scanned total
@fhloston Is :53 a process listening oder a NAT rule pointing to the real dns?
It is the local unbound. But this happens also with 1.1.1.1 as DNS.
root@fw01:/usr/local/etc/rspamd # sockstat -l | grep unbound
root syslog-ng 82907 21 dgram /var/unbound/var/run/log
unbound unbound 13456 3 udp6 *:53 *:*
unbound unbound 13456 4 tcp6 *:53 *:*
unbound unbound 13456 5 udp4 *:53 *:*
unbound unbound 13456 6 tcp4 *:53 *:*
unbound unbound 13456 7 tcp4 127.0.0.1:953 *:*
unbound unbound 13456 8 dgram (not connected)
And lastly, switching the milter protocol to IPv4 results in the mail passing through within 1s but containing no DKIM sig.
2020-08-15 12:41:55 #9772(rspamd_proxy) <8242d6>; proxy; proxy_accept_socket: accepted milter connection from 127.0.0.1 port 16992
2020-08-15 12:41:55 #9772(rspamd_proxy) <8242d6>; milter; rspamd_milter_process_command: MTA specifies too old protocol: 4, aborting connection
2020-08-15 12:41:55 #9772(rspamd_proxy) <8242d6>; proxy; proxy_milter_error_handler: abnormally closing milter connection from: 127.0.0.1:16992, error: invalid protocol version: 4
Hang on - what does the output of rspamadm configdump
represent?
a) Parsed configfiles from disk without reflection of rspamd process state
b) actual config of running rspamd process
I was assuming b) but thinking about it makes me feel it is a)...
Yes, it is a).
So rerunning tests with complete stop/start of rspamd, as previous tests all had the same rspamd_proxy pid 9772.
Good news, so far the issue has not resurfaced.
I have added nameserver = ["1.1.1.1"];
under dns {} in /usr/local/etc/rspamd/options.inc
@mimugmail do i understand correctly, that the options.inc file is currently not rendered by the template system?
Interestingly, as soon as i change it to nameserver = ["127.0.0.1"]
the issue reappears after roughly 1 hour runtime!
So some kind of timeout error in combination with the local DNS resolver?
Can you post your complete edit to the file? Then I can add it to GUI
are you still considering this @mimugmail ?
I'm now working on a PR ..
Hi mimugmail,
it seems you already rolled out the change. Sadly, you missed to allow entering a port after the ip. People who dont use the standard port 53 for DNS (example if you use Unbound in front of DNSCrypt) have no chance
to enter a custom port.
It would be great if you could fix this.
Thank you.
For people still struggling on this issue, like me, I found a workaround for it.
Create a file in:
/usr/local/etc/rspamd/rspamd.conf.local.override
with this edited line:
...
worker "normal" {
bind_socket = "*:11333";
.include "$CONFDIR/worker-normal.inc"
.include(try=true; priority=1,duplicate=merge) "$LOCAL_CONFDIR/local.d/worker-normal.inc"
.include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/worker-normal.inc"
}
...
This will make the Rspamd worker listen on *:11333 for ipv4 and ipv6, therefore the "IPv6 IO write error" will never happen.
I don't know what exactly the problem is, but I guess its a combination of Unbound is serving localhost with ::1 and how Rspamd binding the worker on localhost.
Regards
There is an open PR fixing this. Will come in one of the next releases
After the upgrade from OPNsense 19.7.6 to 19.7.7, Rspamd, on my side, stoped working correctly. Mails from amazon and ebay are delivered twice or more often.
I think the problem is a timeout in postfix, because of a long delay in rspamd (you can see a 1 minute delay between 19:17:10 and 19:18:25 in rspamd.log)
I have removed and reinstalled rspamd -> same result. I also setup a clean OPNsense 19.7.7. -> same result. Go back to OPNsense 19.7.6. -> no IO errors in rspamd log, no errors in rspamd WebUI and no duplicated mails
In OPNsense changelog, no rspamd changes are listed. So i dont know where exactly the problem is. It looks like a IPV6 communication error. But other mails are proceeded just fine.
Rspamd.log:
2019-11-23 19:17:10 #21813(rspamd_proxy) <684cce>; proxy; proxy_accept_socket: accepted milter connection from 127.0.0.1 port 31382 2019-11-23 19:17:10 #21813(rspamd_proxy) <684cce>; milter; rspamd_milter_process_command: got connection from xx.xx.xx.xx:44620 2019-11-23 19:18:25 #21813(rspamd_proxy) <684cce>; proxy; proxy_backend_master_error_handler: abnormally closing connection from backend: ::1, error: IO write error: Broken pipe, retries left: 5 2019-11-23 19:18:25 #21813(rspamd_proxy) <684cce>; proxy; proxy_backend_master_error_handler: retry connection to: 127.0.0.1 retries left: 4 2019-11-23 19:18:25 #31636(normal) <406b83>; task; accept_socket: accepted connection from 127.0.0.1 port 55623, task ptr: 00000504B837F028
In Rspamd WebUI there are the following errors after restart:
hs_helper - error syncing log file: Bad file descriptor normal - error syncing log file: Bad file descriptor controller - error syncing log file: Bad file descriptor rspamd_proxy - error syncing log file: Bad file descriptor
I hope, anyone else can confirm the issue.
Thank you.