Closed kkr78 closed 6 months ago
Here are some additional details in case if it helps to find the root cause
1) Haraka primarily configured to forward the emails to office 365. But for a small number of emails, I set connection.relay=true based on the sender which delivers mail using outbound MX based delivery.
#smtp_forward.ini
enable_outbound=false
[xx.net]
host=xx.mail.protection.outlook.com
port=25
enable_tls=true
timeout=300
connect_timeout=320
2) Find the errors below logged before one hour of the crash. These errors occurred other times as well so may not be related.
2018-05-29T10:36:41.935Z [ERROR] [502FF71D-0ECB-489C-8BB6-5F9F6B60F8C3.1.1] [outbound] Ongoing connection failed to 165.251.51.28:25 : Error: socket hang up
2018-05-29T10:36:42.412Z [ERROR] [502FF71D-0ECB-489C-8BB6-5F9F6B60F8C3.1.1] [outbound] Ongoing connection failed to 165.251.51.29:25 : Error: socket hang up
2018-05-29T10:37:12.414Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 194.202.133.35:25
2018-05-29T10:37:42.417Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 194.202.133.42:25
2018-05-29T10:38:12.419Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 194.202.133.36:25
2018-05-29T10:39:15.145Z [ERROR] [377D3C55-3839-4D83-9FF7-6041ADFE0246.8] [queue/smtp_forward] dead sender
2018-05-29T10:40:44.387Z [ERROR] [91A45D89-7FEB-40E4-8FCE-46214C4C62D7.1.1] [outbound] Ongoing connection failed to 194.214.201.8:25 : Error: 139744362600256:error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/s23_clnt.c:802:
2018-05-29T10:40:44.912Z [ERROR] [91A45D89-7FEB-40E4-8FCE-46214C4C62D7.1.1] [outbound] Ongoing connection failed to 194.214.200.9:25 : Error: 139744362600256:error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/s23_clnt.c:802:
2018-05-29T10:40:45.435Z [ERROR] [91A45D89-7FEB-40E4-8FCE-46214C4C62D7.1.1] [outbound] Ongoing connection failed to 194.214.201.9:25 : Error: 139744362600256:error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/s23_clnt.c:802:
2018-05-29T10:40:45.960Z [ERROR] [91A45D89-7FEB-40E4-8FCE-46214C4C62D7.1.1] [outbound] Ongoing connection failed to 194.214.200.8:25 : Error: 139744362600256:error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure:../deps/openssl/openssl/ssl/s23_clnt.c:802:
2018-05-29T10:50:39.880Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 151.193.65.92:25
2018-05-29T10:55:17.379Z [ERROR] [502FF71D-0ECB-489C-8BB6-5F9F6B60F8C3.1.1] [outbound] Ongoing connection failed to 165.251.51.28:25 : Error: socket hang up
2018-05-29T10:55:17.857Z [ERROR] [502FF71D-0ECB-489C-8BB6-5F9F6B60F8C3.1.1] [outbound] Ongoing connection failed to 165.251.51.29:25 : Error: socket hang up
2018-05-29T10:55:47.859Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 194.202.133.35:25
2018-05-29T10:56:17.894Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 194.202.133.36:25
2018-05-29T10:56:47.926Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 194.202.133.42:25
2018-05-29T11:10:59.275Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 37.18.145.120:25
2018-05-29T11:15:24.530Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 194.214.200.8:25
2018-05-29T11:15:54.619Z [ERROR] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 194.214.200.9:25
Hi. Thanks for the report. Are you running from master
branch? Could #2375 be related?
I installed using npm ( npm install -g Haraka@2.8.18)
Could you check out from master
branch and see if it solves the issue?
Also, it would be wonderful if you run with DEBUG log level as that error backtrace is not helpful at all.
What plugins are you running? Do you have any custom stuff?
I will definitely try. Can I simply download and override the Haraka installed on the box or do I need to follow any process to build and install it?
Enabling it on Debug level is challenging as we need to run it for at least week to reproduce it, it will fill up the logs as we deal with huge traffic. If the issue still persists after the update then I will try to enable at DEBUG and provide the corresponding logs.
Yes, I have a custom rcpt plugin decides whether to use the outbound (MX Based delivery) or smtp_forward (forward to Office 365 SMTP endpoint). Only 3 to 5% uses outbound.
exports.hook_rcpt = function (next, connection, params) {
var rcpt_to=connection.transaction.rcpt_to
var isRelay = isRelayRequired(rcpt_to)
if(!isRelay){
connection.relaying=false
next(OK);
}else{
connection.relaying=true;
next(OK);
}
}
Hello,
I am also getting this very very randomly. My instance only accepts mail on port 587 submission via TLS, then verifies users using haraka-plugin-ldap also using rspamd and relaying to the outside world. Apparently some kind of malicious botnet was hammering this instance at port 587 and it was able to somehow break the connection in an unexpected way like this. I am sorry this is only log level info. To me this looks as if nothing serious has happened during the SMTP conversations and perhaps this exception should get caught and logged rather causing haraka to exit. Or is there something else that might go wrong?
[NOTICE] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] connect ip=103.89.89.236
port=10491 local_ip=:: local_port=587
[INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [tls] secured: cipher=AES256-SHA v
ersion=TLSv1/SSLv3 verified=false
[INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] hook=unrecognized_command
plugin=tls function=upgrade_connection params=STARTTLS retval=OK msg=""
[INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] hook=unrecognized_command
plugin=ldap function=handle_authn params=AUTH retval=OK msg=""
[INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] hook=unrecognized_command
plugin=ldap function=handle_authn params="c2FtcGxlQHB2YWduZXIudGs=" retval=OK ms
g=""
[INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] client [103.89.89.236] half
closed connection
[NOTICE] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] disconnect ip=103.89.89.2
36 rdns=NXDOMAIN helo=win-67fnmqrvgi3.domain relay=N early=N esmtp=Y tls=Y pipe=
N errors=0 txns=0 rcpts=0/0/0 msgs=0/0/0 bytes=0 lr="" time=20.179
[NOTICE] [F7AE40A8-3EBC-400B-9110-8E763FDF2B5D] [core] connect ip=103.89.89.236
port=11876 local_ip=:: local_port=587
[INFO] [F7AE40A8-3EBC-400B-9110-8E763FDF2B5D] [tls] secured: cipher=AES256-SHA v
ersion=TLSv1/SSLv3 verified=false
[INFO] [F7AE40A8-3EBC-400B-9110-8E763FDF2B5D] [core] hook=unrecognized_command
plugin=tls function=upgrade_connection params=STARTTLS retval=OK msg=""
[INFO] [F7AE40A8-3EBC-400B-9110-8E763FDF2B5D] [core] hook=unrecognized_command
plugin=ldap function=handle_authn params=AUTH retval=OK msg=""
[CRIT] [-] [core] Error: read ECONNRESET
[CRIT] [-] [core] at TCP.onread (net.js:660:25)
[NOTICE] [-] [core] Shutting down
It's definitely a bug. The problem is the error handler isn't created soon enough. tls_socket.js:createServer needs an initial .on(error) added to the cryptoSocket returned immediately:
const server = net.createServer(cryptoSocket => {
cryptoSocket.on('error', (err) => ... )
On Thu, Sep 6, 2018 at 4:02 AM pvagner notifications@github.com wrote:
Hello,
I am also getting this very very randomly. My instance only accepts mail on port 587 submission via TLS, then verifies users using haraka-plugin-ldap also using rspamd and relaying to the outside world. Apparently some kind of malicious botnet was hammering this instance at port 587 and it was able to somehow break the connection in an unexpected way like this. I am sorry this is only log level info. To me this looks as if nothing serious has happened during the SMTP conversations and perhaps this exception should get caught and logged rather causing haraka to exit. Or is there something else that might go wrong?
[NOTICE] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] connect ip=103.89.89.236 port=10491 local_ip=:: local_port=587 [INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [tls] secured: cipher=AES256-SHA v ersion=TLSv1/SSLv3 verified=false [INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] hook=unrecognized_command plugin=tls function=upgrade_connection params=STARTTLS retval=OK msg="" [INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] hook=unrecognized_command plugin=ldap function=handle_authn params=AUTH retval=OK msg="" [INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] hook=unrecognized_command plugin=ldap function=handle_authn params="c2FtcGxlQHB2YWduZXIudGs=" retval=OK ms g="" [INFO] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] client [103.89.89.236] half closed connection [NOTICE] [0A8C8D6E-61E5-4470-85E3-F44C0A1D9FF2] [core] disconnect ip=103.89.89.2 36 rdns=NXDOMAIN helo=win-67fnmqrvgi3.domain relay=N early=N esmtp=Y tls=Y pipe= N errors=0 txns=0 rcpts=0/0/0 msgs=0/0/0 bytes=0 lr="" time=20.179 [NOTICE] [F7AE40A8-3EBC-400B-9110-8E763FDF2B5D] [core] connect ip=103.89.89.236 port=11876 local_ip=:: local_port=587 [INFO] [F7AE40A8-3EBC-400B-9110-8E763FDF2B5D] [tls] secured: cipher=AES256-SHA v ersion=TLSv1/SSLv3 verified=false [INFO] [F7AE40A8-3EBC-400B-9110-8E763FDF2B5D] [core] hook=unrecognized_command plugin=tls function=upgrade_connection params=STARTTLS retval=OK msg="" [INFO] [F7AE40A8-3EBC-400B-9110-8E763FDF2B5D] [core] hook=unrecognized_command plugin=ldap function=handle_authn params=AUTH retval=OK msg="" [CRIT] [-] [core] Error: read ECONNRESET [CRIT] [-] [core] at TCP.onread (net.js:660:25) [NOTICE] [-] [core] Shutting down
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haraka/Haraka/issues/2438#issuecomment-419002616, or mute the thread https://github.com/notifications/unsubscribe-auth/AAobY_O4w9TSIG97UN8fOvX50dXX-yqdks5uYNZ9gaJpZM4URaO7 .
I still see this issue with Haraka 2.8.21. This is the only issue that I have with Haraka. Otherwise its running smoothly in production. How can we catch these type of errors with out crashing? Please find the log with Debug below
2018-09-07T15:53:16.198Z [INFO] [-] [core] [outbound] [outbound::25:185.15.50.11:undefined:50] dispense() clients=1 available=0
2018-09-07T15:53:16.198Z [DEBUG] [-] [core] [outbound] [outbound::25:185.15.50.11:undefined:50] createResource() - creating obj - count=1 min=0 max=10
2018-09-07T15:53:16.199Z [DEBUG] [23A29ACF-535D-439B-93AE-F1411A78F77C] [core] [outbound] created
2018-09-07T15:53:20.753Z [INFO] [DA135BB2-AF3B-409A-B933-C5A44976D0CC.1] [core] client [20.4.1.179] half closed connection
2018-09-07T15:53:20.753Z [DEBUG] [DA135BB2-AF3B-409A-B933-C5A44976D0CC.1] [core] client has disconnected
2018-09-07T15:53:20.753Z [DEBUG] [DA135BB2-AF3B-409A-B933-C5A44976D0CC.1] [core] running disconnect hooks
2018-09-07T15:53:20.753Z [DEBUG] [DA135BB2-AF3B-409A-B933-C5A44976D0CC.1] [core] client has disconnected
2018-09-07T15:53:20.753Z [DEBUG] [DA135BB2-AF3B-409A-B933-C5A44976D0CC.1] [core] client has disconnected
2018-09-07T15:53:20.753Z [NOTICE] [DA135BB2-AF3B-409A-B933-C5A44976D0CC.1] [core] disconnect ip=20.4.1.179 rdns=NXDOMAIN helo=[127.0.0.1] relay=N early=N esmtp=Y tls=N pipe=N errors=0 txns=1 rcpts=1/0/0 msgs=1/0/0 bytes=7864 lr="" time=60.028
2018-09-07T15:53:44.155Z [INFO] [-] [core] [outbound] [outbound::25:194.202.133.35:undefined:50] dispense() clients=0 available=0
2018-09-07T15:53:46.199Z [NOTICE] [-] [core] [outbound] Failed to get pool entry: Outbound connection timed out to 185.15.50.11:25
2018-09-07T15:53:46.199Z [INFO] [4ECA8AF9-B4B4-4565-A7D3-8E83CA5F01DA.1.1] [outbound] Looking up A records for: b.mx.p50.neolane.net
2018-09-07T15:53:46.199Z [INFO] [-] [core] [outbound] [outbound::25:185.15.50.11:undefined:50] dispense() clients=0 available=0
2018-09-07T15:53:46.220Z [INFO] [4ECA8AF9-B4B4-4565-A7D3-8E83CA5F01DA.1.1] [outbound] Attempting to deliver to: 185.15.50.12:25 (0) (57)
2018-09-07T15:53:46.220Z [INFO] [-] [core] [outbound] [outbound::25:185.15.50.12:undefined:50] dispense() clients=1 available=0
2018-09-07T15:53:46.220Z [DEBUG] [-] [core] [outbound] [outbound::25:185.15.50.12:undefined:50] createResource() - creating obj - count=1 min=0 max=10
2018-09-07T15:53:46.220Z [DEBUG] [D89E07D4-BFC8-4E24-B3C5-CB38C7540AB6] [core] [outbound] created
2018-09-07T15:53:46.303Z [INFO] [-] [core] [outbound] acquired socket D89E07D4-BFC8-4E24-B3C5-CB38C7540AB6 for outbound::25:185.15.50.12:undefined:50
2018-09-07T15:53:46.957Z [NOTICE] [4ECA8AF9-B4B4-4565-A7D3-8E83CA5F01DA.1.1] [outbound] delivered file=1536335596066_1536335596066_0_21752_UWDuv9_9720_ip-20-4-11-95 domain=reservation.accor-mail.com host=b.mx.p50.neolane.net ip=185.15.50.12 port=25 mode=SMTP tls=N auth=N response="Ok: queued as 94F101AE2B23" delay=30.891 fails=0 rcpts=1/0/0
2018-09-07T15:53:46.957Z [DEBUG] [4ECA8AF9-B4B4-4565-A7D3-8E83CA5F01DA.1.1] [outbound] running delivered hooks
2018-09-07T15:53:47.040Z [DEBUG] [-] [core] [outbound] release_client: D89E07D4-BFC8-4E24-B3C5-CB38C7540AB6 185.15.50.12:25 to undefined
2018-09-07T15:53:47.040Z [INFO] [-] [core] [outbound] [outbound::25:185.15.50.12:undefined:50] dispense() clients=0 available=1
2018-09-07T15:53:47.822Z [INFO] [-] [core] [outbound] [outbound::25:185.15.50.11:undefined:50] dispense() clients=0 available=1
2018-09-07T15:54:06.510Z [CRIT] [-] [core] Error: read ECONNRESET
2018-09-07T15:54:06.510Z [CRIT] [-] [core] at _errnoException (util.js:1031:13)
2018-09-07T15:54:06.510Z [CRIT] [-] [core] at TCP.onread (net.js:619:25)
2018-09-07T15:54:06.511Z [NOTICE] [-] [core] Shutting down
You could try the patch in #2501
No activity and no reports for 5 years. A fair number of changes (esp the discarding of the pooling code) have happened since and I'm thinking this may be a solved problem. Please reopen if this error can be replicated on a modern version of Haraka.
system info
Observed behavior
This issue occurred only two or three times after upgraded to 2.8.18 - one last week and other today. I was not able to capture more traces. is there any way to identify based on the above error stack? BTW, Office 365 closes the connection every 20 mins so it's pretty common to see "DENYSOFT" and followed by "disconnect" in the logs very often. Not sure if that's related.