haraka / Haraka

A fast, highly extensible, and event driven SMTP server
https://haraka.github.io
MIT License
5.02k stars 662 forks source link

Rate_Limit plugin loses Redis TTL due to race condition #1757

Closed bmatson closed 7 years ago

bmatson commented 7 years ago

system info

Please report your OS, Node version, and Haraka version by running this shell script on your Haraka server and replacing this section with the output.

Haraka Haraka.js â Version: 2.8.12
Node v6.9.2
OS Linux ip-10-5-3-9 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
openssl OpenSSL 1.0.2g 1 Mar 2016

Expected behavior

Keys in Redis database should time out after TTL.

Observed behavior

Had one server get stuck with a value in redis, but a -1 for TTL (should have been 120 seconds).

Due to race condition in client.get / client.incr the TTL fails to get set if Redis decides the TTL is expired between the GET command and the INCR. the INCR command will then create a new record with a value of 1, but no TTL. Similar to problems documented here: https://github.com/antirez/redis/issues/3397 and in referenced blog (which documents that even in a MULTI command/transaction you can't count on Redis not deleting the value out from under you for extra fun)....

Implementing a "belt-and-suspenders" workaround/fix based on the REDIS documentation here https://redis.io/commands/incr under "Rate Limiter 2" should fix it. I'm not super familiar with Redis/node_redis, but I believe changing the rate_limit.js plugin starting at line 164 to the following should do it, we end up calling expire twice when a new key is created via setex, but we don't fail to set the key when incr creates it for us, not 100% sure if we need to handle errors in the expire call.....:

client.get(key, function(err, val) {
        if (err) return cb(err);

        connection.logdebug(self, 'key=' + key + ' current value=' + (val || 'NEW' ));

        var check_limits = function(err2, result){
            if (err2) return cb(err2);

            if (result===1) {
              client.expire(key, ttl)
            }

            if (parseInt(val) + 1 > parseInt(limit)) {
                // Limit breached
                connection.lognotice(self, key + ' rate ' + val + ' exceeds ' + limit + '/' + ttl + 's');
                return cb(null, true);
            }
            else {
                // OK
                return cb(null, false);
            }

        };

        if (val == null) { // new key
            client.setex(key, ttl, 1, check_limits);
        }
        else { // old key
            client.incr(key, check_limits);
        }
    });

Steps to reproduce

Not easy to reproduce,

bmatson commented 7 years ago

Actually the following is probably a better fix:

client.get(key, function(err, val) {
        if (err) return cb(err);

        connection.logdebug(self, 'key=' + key + ' current value=' + (val || 'NEW' ));

        var check_limits = function(err2, result){
            if (err2) return cb(err2);

            if (parseInt(val) + 1 > parseInt(limit)) {
                // Limit breached
                connection.lognotice(self, key + ' rate ' + val + ' exceeds ' + limit + '/' + ttl + 's');
                return cb(null, true);
            }
            else {
                // OK
                return cb(null, false);
            }

        };

        if (val == null) { // new key
            client.setex(key, ttl, 1, check_limits);
        }
        else { // old key
            client.incr(key, function (err3, result) {
                if (result===1) {
                    client.expire(key, ttl);
                }
                check_limits(err3,result);
            });
        }
    });