haraka / Haraka

A fast, highly extensible, and event driven SMTP server
https://haraka.github.io
MIT License
5.02k stars 662 forks source link

Critical error and Haraka shutdown after server socket closed #1853

Closed dennisdupont closed 7 years ago

dennisdupont commented 7 years ago

system info

Haraka Haraka.js - Version: 2.8.13
Node v6.9.4
OS Linux xxx.xxx.net 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
openssl OpenSSL 1.0.1e-fips 11 Feb 2013

Expected behavior

Expected Haraka to continue running despite a socket being ended from the far end.

Observed behavior

2017-03-10T16:21:07.766Z [INFO] [-] [core] [outbound] Processing domain: xxx.com 2017-03-10T16:21:07.767Z [PROTOCOL] [439792B7-CFEF-4221-B3AC-1B6B42A4A8C9.35.1] [outbound] S: 250 2.1.0 Sender OK\r\n 2017-03-10T16:21:07.767Z [PROTOCOL] [439792B7-CFEF-4221-B3AC-1B6B42A4A8C9.35.1] [outbound] C: DATA 2017-03-10T16:21:07.767Z [PROTOCOL] [439792B7-CFEF-4221-B3AC-1B6B42A4A8C9.35.1] [outbound] S: 250 2.0.0 Resetting\r\n 2017-03-10T16:21:07.767Z [PROTOCOL] [439792B7-CFEF-4221-B3AC-1B6B42A4A8C9.35.1] [outbound] S: 421 4.7.0 Too many protocol errors (6) on this connection, closing transmission channel.\r\n 2017-03-10T16:21:07.768Z [PROTOCOL] [439792B7-CFEF-4221-B3AC-1B6B42A4A8C9.35.1] [outbound] C: RSET 2017-03-10T16:21:07.768Z [DEBUG] [-] [core] [outbound] release_client: 216.32.180.74:25 to undefined 2017-03-10T16:21:07.768Z [INFO] [-] [core] [outbound] [outbound::25:216.32.180.74:undefined:60] dispense() clients=0 available=7 2017-03-10T16:21:07.768Z [DEBUG] [-] [core] Temp fail for: Upstream error: 421 4.7.0 Resetting Too many protocol errors (6) on this connection, closing transmission channel. 2017-03-10T16:21:07.768Z [DEBUG] [439792B7-CFEF-4221-B3AC-1B6B42A4A8C9.35.1] [outbound] running deferred hooks 2017-03-10T16:21:07.768Z [INFO] [439792B7-CFEF-4221-B3AC-1B6B42A4A8C9.35.1] [outbound] Temp failing 1489162857819_0_1709_1060.xxx.net for 64 seconds: Upstream error: 421 4.7.0 Resetting Too many protocol errors (6) on this connection, closing transmission channel. 2017-03-10T16:21:07.769Z [WARN] [-] [core] [outbound] Socket [outbound::25:216.32.180.74:undefined:60] in pool got FIN 2017-03-10T16:21:07.769Z [DEBUG] [-] [core] [outbound] destroying pool entry for 216.32.180.74:25 2017-03-10T16:21:07.772Z [CRIT] [-] [core] Error: This socket has been ended by the other party 2017-03-10T16:21:07.772Z [CRIT] [-] [core] at TLSSocket.writeAfterFIN [as write] (net.js:291:12) 2017-03-10T16:21:07.772Z [CRIT] [-] [core] at pluggableStream.write (/usr/lib/node_modules/Haraka/tls_socket.js:109:34) 2017-03-10T16:21:07.772Z [CRIT] [-] [core] at ReadStream.ondata (_stream_readable.js:555:20) 2017-03-10T16:21:07.772Z [CRIT] [-] [core] at emitOne (events.js:101:20) 2017-03-10T16:21:07.772Z [CRIT] [-] [core] at ReadStream.emit (events.js:188:7) 2017-03-10T16:21:07.773Z [CRIT] [-] [core] at readableAddChunk (_stream_readable.js:176:18) 2017-03-10T16:21:07.773Z [CRIT] [-] [core] at ReadStream.Readable.push (_stream_readable.js:134:10) 2017-03-10T16:21:07.773Z [CRIT] [-] [core] at onread (fs.js:2004:12) 2017-03-10T16:21:07.773Z [CRIT] [-] [core] at FSReqWrap.wrapper [as oncomplete] (fs.js:681:17) 2017-03-10T16:21:07.774Z [NOTICE] [-] [core] Shutting down

Steps to reproduce

robertomatute commented 7 years ago

Hi,

I am having a similar issue. When our haraka server is under load (100k msgs at hour), suddenly shutdown with this trace:

2017-03-21T17:35:17.506Z [CRIT] [-] [core]     at pluggableStream.write (/usr/local/lib/node_modules/Haraka/tls_socket.js:109:34)
2017-03-21T17:35:17.506Z [CRIT] [-] [core]     at ReadStream.ondata (_stream_readable.js:555:20)
2017-03-21T17:35:17.506Z [CRIT] [-] [core]     at emitOne (events.js:101:20)
2017-03-21T17:35:17.506Z [CRIT] [-] [core]     at ReadStream.emit (events.js:188:7)
Error UncaughtException:
Error: This socket is closed
    at Socket._writeGeneric (net.js:680:19)
    at Socket._write (net.js:731:8)
    at doWrite (_stream_writable.js:334:12)
    at writeOrBuffer (_stream_writable.js:320:5)
    at Socket.Writable.write (_stream_writable.js:247:11)
    at Socket.write (net.js:658:40)
    at pluggableStream.write (/usr/local/lib/node_modules/Haraka/tls_socket.js:109:34)
    at ReadStream.ondata (_stream_readable.js:555:20)
    at emitOne (events.js:101:20)
    at ReadStream.emit (events.js:188:7)2017-03-21T17:35:17.508Z [NOTICE] [-] [core] Shutting down

Haraka Version

# haraka --version
Haraka.js — Version: 2.8.13

We are using haraka as a router to our pool of smarthosts running postfix.

We are using get_mx hook to choose what smarthost to use. Following documentation, we configure pool_timeout=0 but was slow processing mails.

So we change to pool_timeout=100.

Right now our server is in production, and shutting down multiple times a day.

Why haraka is shutting down, because of the other end closing connection. Is there a way to catch this error and not shut down the server?

Best Regards,

Roberto

robertomatute commented 7 years ago

Shutting down under pressure

Yesterday we were processing 500k emails, and the server shut down 5 times.

2017-03-21T20:52:06.887Z [CRIT] [-] [core]     at ReadStream.emit (events.js:188:7)
2017-03-21T20:52:06.887Z [CRIT] [-] [core]     at readableAddChunk (_stream_readable.js:176:18)
2017-03-21T20:52:06.887Z [CRIT] [-] [core]     at ReadStream.Readable.push (_stream_readable.js:134:10)
2017-03-21T20:52:06.887Z [CRIT] [-] [core]     at onread (fs.js:2004:12)
2017-03-21T20:52:06.887Z [CRIT] [-] [core]     at FSReqWrap.wrapper [as oncomplete] (fs.js:681:17)
Error UncaughtException:
Error: This socket has been ended by the other party
    at Socket.writeAfterFIN [as write] (net.js:291:12)
    at pluggableStream.write (/usr/local/lib/node_modules/Haraka/tls_socket.js:109:34)
    at ReadStream.ondata (_stream_readable.js:555:20)
    at emitOne (events.js:101:20)
    at ReadStream.emit (events.js:188:7)
    at readableAddChunk (_stream_readable.js:176:18)
    at ReadStream.Readable.push (_stream_readable.js:134:10)
    at onread (fs.js:2004:12)
    at FSReqWrap.wrapper [as oncomplete] (fs.js:681:17)2017-03-21T20:52:06.888Z [NOTICE] [-] [core] Shutting down

We have a really bad situation here. Please help!

baudehlo commented 7 years ago

This sounds pretty bad. There are clearly bugs in the pooling code of outbound. I'm working right now on splitting up the outbound code so it's a lot easier to work on, but fixing this won't be done soon.

The only recommendation I have for now is perhaps send all outbound mail through another local MTA. Either a much earlier version of Haraka (before outbound pool support arrived) or something like Postfix or Exim.

On Thu, Mar 23, 2017 at 8:43 AM, Roberto Matute notifications@github.com wrote:

Shutting down under pressure

Yesterday we were processing 500k emails, and the server shut down 5 times.

2017-03-21T20:52:06.887Z [CRIT] [-] [core] at ReadStream.emit (events.js:188:7) 2017-03-21T20:52:06.887Z [CRIT] [-] [core] at readableAddChunk (_stream_readable.js:176:18) 2017-03-21T20:52:06.887Z [CRIT] [-] [core] at ReadStream.Readable.push (_stream_readable.js:134:10) 2017-03-21T20:52:06.887Z [CRIT] [-] [core] at onread (fs.js:2004:12) 2017-03-21T20:52:06.887Z [CRIT] [-] [core] at FSReqWrap.wrapper [as oncomplete] (fs.js:681:17) Error UncaughtException: Error: This socket has been ended by the other party at Socket.writeAfterFIN [as write] (net.js:291:12) at pluggableStream.write (/usr/local/lib/node_modules/Haraka/tls_socket.js:109:34) at ReadStream.ondata (_stream_readable.js:555:20) at emitOne (events.js:101:20) at ReadStream.emit (events.js:188:7) at readableAddChunk (_stream_readable.js:176:18) at ReadStream.Readable.push (_stream_readable.js:134:10) at onread (fs.js:2004:12) at FSReqWrap.wrapper [as oncomplete] (fs.js:681:17)2017-03-21T20:52:06.888Z [NOTICE] [-] [core] Shutting down

We have a really bad situation here. Please help!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haraka/Haraka/issues/1853#issuecomment-288706854, or mute the thread https://github.com/notifications/unsubscribe-auth/AAobY4cXTpNVeBu3Bh4Odz_9dfX0mdrgks5romj7gaJpZM4MaIK1 .

robertomatute commented 7 years ago

@baudehlo thanks for your response.

Actually we use haraka to connect to a postfix smarthost. I mean the arquitecture is something like:

app -> haraka (plugin with get_mx hook) -> postfix -> Internet

We are using haraka to balance the load between postfix servers, based on some rules (queue size, disk space, from address) for each postfix server.

We will change our version to a pre-pooling version.

andris9 commented 7 years ago

Sorry to promote another product here but I'd suggest looking at ZoneMTA https://github.com/zone-eu/zone-mta We use it to send out 500K messages every day with peaks around 80 msg/sec. Not as mature as Haraka or Postfix though.

robertomatute commented 7 years ago

We already change our haraka to 2.8.4. It handled 600k emails in 3 hours. I believe the problem was indeed in the pool code.