Closed krootee closed 10 years ago
Changing to "return this.discard()" helps, but now we need to use different version of "discard", since current version tries to call next_cb which is not required and if done causes internal plugin callback loop.
More crashes were observed - after investigating code i think problem is inside "discard" function itself, since looking at it's code usage in outbound.js it should not call "this.next_cb()" inside of it. outbound.js: HMailItem.prototype.discard = function () { this.refcount--; if (this.refcount === 0) { // Remove the file. fs.unlink(this.path, function () {}); this.next_cb(); } }
Hmm, I'll have to have a further look at this. Do you have a bit more of logs I can look at?
Not much logs now (made workaround and pushed to PROD), but it was reproducing easily with every bounce back - below is simple plugin that helps to reproduce the problem.
exports.hook_bounce = function (next, hmail) { return next(STOP); };
OK so with the above simple plugin, adding the "return" in fixes the problem.
I'm not sure what the other problem you're seeing is - the next_cb needs to be called or the queue will never properly flush.
I think you are right after checking logs. But leave next_cb() call in discard generates a lot of crashes with message like this:
[2013-10-01 10:26:10] [CRIT] [-] [core] Error: Callback was already called.
[2013-10-01 10:26:10] [CRIT] [-] [core] at /usr/lib/node_modules/Haraka/node_modules/async/lib/async.js:22:31
[2013-10-01 10:26:10] [CRIT] [-] [core] at HMailItem.hmail.next_cb (/usr/lib/node_modules/Haraka/outbound.js:51:9)
[2013-10-01 10:26:10] [CRIT] [-] [core] at HMailItem.discard (/usr/lib/node_modules/Haraka/outbound.js:1230:14)
[2013-10-01 10:26:10] [CRIT] [-] [core] at pluggableStream.
I need to see the lines above that
Specifically this looks like an error coming from populate_bounce_message ?
And your line numbers don't line up with Haraka 2.2.1 - have you modified your own outbound.js?
Probably it was already with our internal fixes - I will run clean version with "return" fix only and provide logs.
I ran server with "return" fix only for 12h under the load - no "next_cb()" crashes were registered. So probably it was one of our other changes that triggered crashes.
Checked again and found few crashes - it seems like edge case that is not handled inside Haraka. In outbound.js in "HMailItem.prototype.try_deliver_host" we set timeout to 15sec vs 300sec "socket.setTimeout(15 * 1000);" and that triggers problems sometimes (under the load having 300sec is not very good default setting) - you get timeout from the client connection, but after it you receive 550 from the server. If you set timeout to very small value then bug should be easy to reproduce.
Below is log with sensetive data removed from it:
[2013-10-07 14:16:32] [DEBUG] [F0276E6F-2832-474B-A12B-518E5B46ECA2.15] [core] running mail hooks
[2013-10-07 14:16:32] [NOTICE] [F0276E6F-2832-474B-A12B-518E5B46ECA2.15] [core] sender
I've committed a fix. Please test and re-open this bug if it doesn't work.
Thanks - will do.
Had a similar issue. Haraka keeps crashing due to a bounced email
Wrote a bounce plugin, always returning a next(OK).
[DEBUG] [0C3F9456-00BB-45A6-9A31-2865E0CB9418.25] [outbound] running bounce hook in email_service.hook_bounce plugin [DEBUG] [-] [email_service.hook_bounce] hook_bounce [DEBUG] [-] [email_service.hook_bounce] "Unrecognised response from upstream server: rblsmtpd: 11.11.11.11 pid 11483: 451 http://www.barracudanetworks.com/reputation/?pr=1&ip=11.11.11.11\n" [DEBUG] [-] [email_service.hook_bounce] {"domain":null,"_events":{},"_maxListeners":10,"path":"/opt/haraka/queue/1385342052540_1_9671_3918.smtp.-----.com","filename":"1385342052540_1_9671_3918.smtp.-----.com","next_process":"1385341986158","num_failures":1,"notes":{},"file_size":105499,"data_start":301,"todo":{"domain":"-----.com.sg","rcpt_to":[{"original":"admin@-----.com.sg","user":"admin","host":"-----.com.sg"}],"mail_from":{"original":"mail@-----.com","user":"mail","host":"-----.com"},"notes":{},"uuid":"0C3F9456-00BB-45A6-9A31-2865E0CB9418.25"},"hooks_to_run":[],"mxlist":[],"hostlist":[],"bounce_error":"Unrecognised response from upstream server: rblsmtpd: 11.11.11.11 pid 11483: 451 http://www.0-------.com/reputation/?pr=1&ip=11.11.11.11\n","current_hook":[{"name":"email_service.hook_bounce","base":{},"timeout":30,"full_paths":["/opt/haraka/plugins/email_service.hook_bounce.js","/usr/local/lib/node_modules/Haraka/plugins/email_service.hook_bounce.js"],"config":{},"hooks":{"queue_ok":["hook_queue_ok"],"delivered":["hook_delivered"],"bounce":["hook_bounce"]}},"hook_bounce"]} [DEBUG] [-] [email_service.hook_bounce] {"pool":{"2555:localhost:300":{}}}} [DEBUG] [-] [email_service.hook_bounce] UUID:0C3F9456-00BB-45A6-9A31-2865E0CB9418 [DEBUG] [-] [email_service.hook_bounce] filename DOES NOT EXISTS:1385342052540_1_9671_3918.smtp.-----.com [INFO] [0C3F9456-00BB-45A6-9A31-2865E0CB9418.25] [outbound] hook=bounce plugin=email_service.hook_bounce function=hook_bounce params="Unrecognised response from upstream server: rblsmtpd: 11.11.11.11 pid 11483: 451 http://www.barracudanetworks.com/reputation/?pr=1&ip=11.11.11.11 " retval=OK msg="" [INFO] [0C3F9456-00BB-45A6-9A31-2865E0CB9418.25] [outbound] plugin responded with: 906. Not sending bounce. [PROTOCOL] [0C3F9456-00BB-45A6-9A31-2865E0CB9418.25] [outbound] S: 220 rblsmtpd.local\r\n [PROTOCOL] [0C3F9456-00BB-45A6-9A31-2865E0CB9418.25] [outbound] C: EHLO smtp.-----.com [ERROR] [0C3F9456-00BB-45A6-9A31-2865E0CB9418.25] [outbound] Ongoing connection failed: Error: write after end [INFO] [0C3F9456-00BB-45A6-9A31-2865E0CB9418.25] [outbound] Temp failing 1385342052540_1_9671_3918.smtp.-----.com for 128 seconds: Tried all MXs [PROTOCOL] [301D0BF6-555C-41B0-BE53-7518828014A7.52] [outbound] S: 220 web.de (mxweb106) Nemesis ESMTP Service ready\r\n [DEBUG] [-] [email_service.hook_bounce] "OK id=1Vkkkb-000Y2k-UM" [DEBUG] [98E6D740-00FB-4ACF-B8A8-22AE9DDCF509.47] [outbound] hook=delivered plugin=email_service.hook_bounce function=hook_delivered params="OK id=1Vkkkb-000Y2k-UM" retval=CONT msg="" [CRIT] [-] [core] Error: ENOENT, open '/opt/haraka/queue/1385342052540_1_9671_3918.smtp.-----.com' [INFO] [-] [core] Shutting down
opened another issue #388
It seems that bounce processing is broken in Haraka since version 2.1.0 (or so) when using custom bounce processing plugin - we just did upgrade to latest version from 2.1.0 (since its impossible to install it anymore - broken iconv dependency). Haraka fails with following error: [2013-10-01 09:24:27] [ERROR] [4AFE87CB-BA59-41D2-AD73-751C8303A036.12] [outbound] Double bounce: Error populating bounce message: Error: ENOENT, open '/var/haraka/outbound/queue/1380618904311_0_13792_3358.mailprocessing-server2' [2013-10-01 09:24:27] [CRIT] [-] [core] Error: Callback was already called. [2013-10-01 09:24:27] [CRIT] [-] [core] at /usr/lib/node_modules/Haraka/node_modules/async/lib/async.js:22:31 [2013-10-01 09:24:27] [CRIT] [-] [core] at HMailItem.hmail.next_cb (/usr/lib/node_modules/Haraka/outbound.js:51:9) [2013-10-01 09:24:27] [CRIT] [-] [core] at HMailItem.double_bounce (/usr/lib/node_modules/Haraka/outbound.js:1207:10) [2013-10-01 09:24:27] [CRIT] [-] [core] at /usr/lib/node_modules/Haraka/outbound.js:1191:25 [2013-10-01 09:24:27] [CRIT] [-] [core] at ReadStream. (/usr/lib/node_modules/Haraka/outbound.js:1145:9)
[2013-10-01 09:24:27] [CRIT] [-] [core] at ReadStream.EventEmitter.emit (events.js:95:17)
[2013-10-01 09:24:27] [CRIT] [-] [core] at fs.js:1500:12
[2013-10-01 09:24:27] [CRIT] [-] [core] at Object.oncomplete (fs.js:107:15)
[2013-10-01 09:24:27] [NOTICE] [-] [core] Shutting down
We have custom bounce processing plugin that registers such failures in central system and then returns from plugin STOP. But code in outbound.js seems to have missing return when calling "this.discard()": .... HMailItem.prototype.bounce_respond = function (retval, msg) { if (retval != constants.cont) { this.loginfo("plugin responded with: " + retval + ". Not sending bounce."); if (retval === constants.stop) { this.discard(); // calls next_cb } else { return this.next_cb(); } } ...