haraka / Haraka

A fast, highly extensible, and event driven SMTP server
https://haraka.github.io
MIT License
5.02k stars 662 forks source link

Problems with Haraka running on CentOS 6 #293

Closed ghost closed 11 years ago

ghost commented 11 years ago

Hello Haraka Team

We are use the Haraka Mail Server on our Linux Servers (CentOS 6.3) But we have a problem with memory overflow in the software.

We are use node 0.6.9 with Haraka 2.0.4. Today i test the current version of node.js (0.10.1) with Haraka 2.0.5, but now the process hangs after 5 minutes with no error. What is the recommended version of node.js for work with Haraka?

Our bugfix is restart the server every 4 hours.

smfreegard commented 11 years ago

Recommended version is definitely node 0.8 at the moment.

baudehlo commented 11 years ago

Can you show us your config/plugins file?

ghost commented 11 years ago

our config is very simple

# # block mail from some known bad HELOs - see config/helo.checks.ini for configuration
#helo.checks
# # Only accept mail for your personal list of hosts. Edit config/host_list
# # NOTE: THIS IS REQUIRED for inbound email.
#lookup_rdns.strict
rcpt_to.in_host_list
# # Queue
custom_queue

we are use a queue based on quarantine queue

// custom_queue
// base code from quarantine plugin

var path = require('path');
var fs = require('fs');

var existsSync = require('./utils').existsSync;

exports.register = function () {
    this.register_hook('queue','custom_queue');
    this.register_hook('queue_outbound','custom_queue');
}

// http://unknownerror.net/2011-05/16260-nodejs-mkdirs-recursion-create-directory.html
var mkdirs = exports.mkdirs = function(dirpath, mode, callback) {
    if (existsSync(dirpath)) {
        return callback(dirpath);
    }
    mkdirs(path.dirname(dirpath), mode, function() {
        fs.mkdir(dirpath, mode, callback);
    });
}

var zeroPad = exports.zeroPad = function (n, digits) {
    n = n.toString();
    while (n.length < digits) {
        n = '0' + n;
    }
    return n;
}

exports.hook_init_master = function (next) {
    // At start-up; delete any files in the temporary directory
    // NOTE: This is deliberately syncronous to ensure that this
    //       is completed prior to any messages being received.
    var base_dir = '/data/mail';
    var tmp_dir = [ base_dir, 'tmp' ].join('/');
    if (existsSync(tmp_dir)) {
        var dirent = fs.readdirSync(tmp_dir);
        this.loginfo('Removing temporary files from: ' + tmp_dir);
        for (var i=0; i<dirent.length; i++) {
            fs.unlinkSync([ tmp_dir, dirent[i] ].join('/'));
        }
    }
    return next();
}

exports.custom_queue = function (next, connection) {

        var transaction = connection.transaction;

        // Calculate date in YYYYMMDDHHMM format
        var d = new Date();
        var timestamp = d.getFullYear() + zeroPad(d.getMonth()+1, 2) + this.zeroPad(d.getDate(), 2) + this.zeroPad(d.getHours(), 2) + this.zeroPad(d.getMinutes(), 2);

        var base_dir = '/data/mail';
        var dir;

        // Check remoteip and set folder
        var ourMailServer = false;
        var remoteip_list = this.config.get('regex_remoteip', 'list');
        for (var remoteip in remoteip_list) {
                var regexp = new RegExp(recipient_regex);
                if (regexp.test(connection.transaction.remote_ip)) {
                        ourMailServer = true;
                        break;
                }
        }

        // Check recipient and set folder
        var recipient_list = this.config.get('regex_recipient', 'list');
        for (var i=0,l=recipient_list.length; i < l; i++) {
                var temp = recipient_list[i].split('=');
                if (temp.length != 2)
                {
                        connection.logerror(this, 'recipient_list error: ' + recipient_list[i]);
                        break;
                }
                var recipient_regex = temp[0], folder = temp[1];
                var regexp = new RegExp(recipient_regex);
                if (regexp.test(connection.transaction.rcpt_to)) {
                        dir = folder;
                        break;
                }
        }

        // Set folder gul or spam
        if (!dir)
        {
                if (ourMailServer)
                {
                        dir = 'gul';
                } else {
                        dir = 'spam';
                }
        }

        dir = [ dir, timestamp ].join('/');

        var plugin = this;
        // Create all the directories recursively if they do not exist first.
        // Then write the file to a temporary directory first, once this is
        // successful we hardlink the file to the final destination and then
        // remove the temporary file to guarantee a complete file in the
        // final destination.
        mkdirs([ base_dir, 'tmp' ].join('/'), parseInt('0770', 8), function () {
                mkdirs([ base_dir, dir ].join('/'), parseInt('0770', 8), function () {
                        var ws = fs.createWriteStream([ base_dir, 'tmp', transaction.uuid ].join('/'));
                        ws.on('error', function (err) {
                                connection.logerror(plugin, 'Error writing file: ' + err.message);
                                return next();
                        });
                        ws.on('close', function () {
                                fs.link([ base_dir, 'tmp', transaction.uuid ].join('/'),
                                                [ base_dir, dir, transaction.uuid ].join('/'),
                                                function (err) {
                                                        if (err) {
                                                                connection.logerror(plugin, 'Error writing file: ' + err);
                                                        }
                                                        else {
                                                                connection.loginfo(plugin, 'Save message in: ' +
                                                                                           [ base_dir, dir, transaction.uuid ].join('/'));
                                                                // Now delete the temporary file
                                                                fs.unlink([ base_dir, 'tmp', transaction.uuid ].join('/'));
                                                        }
                                                        //return next();
                                                        return next(OK, "Queued!");
                                                }
                                );
                        });
                        transaction.message_stream.pipe(ws, { line_endings: '\n' });
                });
        });
}
baudehlo commented 11 years ago

Odd, nothing there looks like it should leak. What kind of traffic levels are you seeing?

ghost commented 11 years ago

We have 30Mbit traffic on best time 2013-03-25 17_12_38-zabbix01_ Custom graphs refreshed every 30 sec

an this is the memory usage of the machine, every 6 hour the service restart and memory is free 2013-03-25 17_17_46-zabbix01_ History refreshed every 30 sec

for compare the traffic of one day 2013-03-25 17_24_15-zabbix01_ Custom graphs refreshed every 30 sec

baudehlo commented 11 years ago

Is that all hitting Haraka??

ghost commented 11 years ago

yes all the traffic hit haraka.

baudehlo commented 11 years ago

From discussing this on IRC: "The only thing I can suggest is to install node-webkit-agent and take some heap snapshots to see if that shows anything obvious"

baudehlo commented 11 years ago

Also this was suggested: https://hacks.mozilla.org/2012/11/tracking-down-memory-leaks-in-node-js-a-node-js-holiday-season/

ghost commented 11 years ago

My first step i try to update to node 0.8. And check again if the leak always existing.

ghost commented 11 years ago

I have update to node version 0.8.9 when we are now start the haraka service it crash with error:

[CRIT] [-] [core] TypeError: Object # has no method 'daemonize' [CRIT] [-] [core] at Object.Server.daemonize (/usr/local/lib/node_modules/Haraka/server.js:49:16) [CRIT] [-] [core] at Object.Server.createServer (/usr/local/lib/node_modules/Haraka/server.js:131:14) [CRIT] [-] [core] at Object. (/usr/local/lib/node_modules/Haraka/haraka.js:57:8) [CRIT] [-] [core] at Module._compile (module.js:449:26) [CRIT] [-] [core] at Object.Module._extensions..js (module.js:467:10) [CRIT] [-] [core] at Module.load (module.js:356:32) [CRIT] [-] [core] at Function.Module._load (module.js:312:12) [CRIT] [-] [core] at Module.require (module.js:362:17) [CRIT] [-] [core] at require (module.js:378:17) [CRIT] [-] [core] at Object. (/usr/local/lib/node_modules/Haraka/bin/haraka:342:5) [INFO] [-] [core] Shutting down

smfreegard commented 11 years ago

Try this:

cd /usr/local/lib/node_modules/Haraka npm install daemon@0.5

ghost commented 11 years ago

Thanks smfreegard, now it run as Service. After 5 minutes its stop with the following error i have now activate a higher loglevel. [ERROR] [90F34488-4360-4472-80FB-DA7AB0FAF9F3.1125] [custom_queue] Error writing file: EMFILE, open '/data/mail/tmp/90F34488-4360-4472-80FB-DA7AB0FAF9F3.1125'

ghost commented 11 years ago

The Haraka runs now 40 minutes and needs 1800MB memory. We have now the recommended node version same memory leak. We are no experts with node.js and find memory leaks. Is it a possible way that we install a test server for you?

baudehlo commented 11 years ago

EMFILE means you've run out of file descriptors. I think what's happening is that the delivery concurrency is just so high that you have too many connections open.

Increase the number of available file descriptors: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

You may need to implement the rate_limit plugin to limit inbound connections.

Are you doing outbound too? If so see the config/outbound.concurrency_max file for controlling that (though the default is 100 which shouldn't cause too much hassle).

Also make sure you run with idle notifications off: node --use_idle_notification=false /usr/local/bin/haraka

My thought is this isn't a leak - just a lot of connections.

ghost commented 11 years ago

I think it is a leak, when i not restart the service, the service uses all memory and the full swap space. The incomming connections are high but the only thing what the service have to do is save the message in a folder and then it can close the connection. Before we use qpsmtpd and here we have no problems.

baudehlo commented 11 years ago

I think we're going to have to get on the server to figure out what's going on here, unless you can debug using the information we've given above. Lots of Haraka users don't see leaks like this, so we're really not sure what's going on.

Also make sure you try with Haraka v2.1.1 and Node v0.10.3.

krootee commented 11 years ago

We face similar problems with our Haraka server on Ubuntu - under heavy load (3-5Mbit/sec) Haraka leaks memory and eventually crashes (and restarted automatically by upstart). This was with Haraka 2.0.4 and Node 0.8.22. I'm going to upgrade this week to Haraka 2.1.1 and Node 0.10.3.

baudehlo commented 11 years ago

Good to know it's not an isolated problem. Do you have the option of attaching the debugger as detailed in the links above so we have some way of tracking this down?

ghost commented 11 years ago

Maybe you can give us a example on what file we are add the code for the debugger? If time permits i update this week to the current Version.

baudehlo commented 11 years ago

One issue that we might have seen is that your on('error') handler doesn't clean up (delete, close etc) the write stream. Might want to add that in and see if it helps.

We're wondering if this problem is caused by systems that don't finish cleanly. Really hard to tell at this stage, but we're taking this seriously.

krootee commented 11 years ago

Deploying 2.1.1 today, so will check if problems exists in new version - found major problem with 2.1.1 under load testing. Will report it soon.

krootee commented 11 years ago

Deployed yesterday 2.1.1 on top of Node 0.10.4 and no problems so far with memory leaks.

baudehlo commented 11 years ago

Please try 2.1.2 along with Node v0.10.5.

ghost commented 11 years ago

For the project with the high traffic we have change to a in-house development. At the momemt i can not test again. Sorry.

baudehlo commented 11 years ago

OK, closing this issue then.