sewenew / redis-plus-plus

Redis client written in C++
Apache License 2.0
1.58k stars 347 forks source link

[BUG?]memory accumulation causes oom #556

Closed stxrlong closed 2 months ago

stxrlong commented 5 months ago

my program always has an OOM problem during a specific period of time. After a simple research, I found that the IO of write was very large during this period, which concurrency could reach 80k-100k QPS, and the memory usage increased, of course, the problem will restore after the async redis object was released. I used the async feature of the redis++, which is a nice lib and very easy to use. Because of the above issue, I spent some time studying the source code of redis++ and hiredis. I didn't find the root cause of memory accumulation, but I found two points in redis++, among which one point has greatly alleviated the memory stacking; I used some memleak detection tools, such as valgrind/asan, but they all prevented the high concorrency

  1. we can release the formatted command after calling the hiredis async func where has no error occurred, because the hiredis has copied the value. this point does alleviate the memory stacking

    void _handle(redisAsyncContext &ctx, HiredisAsyncCallback callback) {
        if (redisAsyncFormattedCommand(&ctx,
                    callback, this, _cmd.data(), _cmd.size()) != REDIS_OK) {
            throw_error(ctx.c, "failed to send command");
        }
    
        // release the cmd after calling the above func to reduce the peak usage of memory
        FormattedCommand cmd = std::move(_cmd);
    }
  2. release the context when the hiredis do calling the disconnectcallback func

    void AsyncConnection::_fail_events(std::exception_ptr err) {
    // we'd better release the context
    if(_ctx) {
        redisAsyncFree(_ctx);
        _ctx = nullptr;
    }
    
    _err = err;
    
    _state = State::BROKEN;
    
    // Must call _clean_up after `_err` has been set.
    _clean_up();
    }

    this is no longer a unique ptr, this point comes from _ctx = ctx.release();, therefore, the deletation struct is useless However, when we call func _fail_events, hiredis may have already released the context by calling func redisAsyncDisconnect;

  3. I finally think the memory accumulation may caused by reconnect logic when redis++ actively discovers redis connection issues when sending commands

sewenew commented 5 months ago

Thanks for your reporting!

I finally think the memory accumulation may caused by reconnect logic when redis++ actively discovers redis connection issues when sending commands

Did you have a network issue, or some special configuration, so that Redis connection was broken from time to time? If the network is healthy, do you still have the OOM problem?

Sorry for the late reply. Too busy these days...

Regards

stxrlong commented 4 months ago

u r so kind, yeah, that's right, i wrote a test case that runs above one thread using the command [zadd]. when the concurrency comes to 20k, the memory reaches 800M instantly. if there is no exception thrown, the memory will return to normal(about 14M). and if we receive exception, the memory will stay at 300M after we get all the responses, the approximate test case is as follows:

#include "sw/redis++/async_redis.h"
#include <thread>

int main(int argc, char**argv){
    int qps = 20000;
    std::string key = "sortedkey";

    sw::redis::AsyncRedis redis;
    ... // connnection
    std::thread t([qps, key, &redis]{
        std::atomic_int32_t res = 0, excpt = 0;
        std::string v;
        for(auto i=0; i<qps; ++i){
            v += std::to_string(i);
            redis.zadd(key, v, i).then(boost::lanuch::sync, [](auto&&fut){
                ++res;
                try{
                    fut.get()
                }catch(...){
                    ++excpt;
                }
            });
        }

        while(true){
            if(res== qps)
                break; 
            std::this_thread::sleep_for(std::chrono::seconds(1);
        }
    });

    ... // thread join
    ... // while true
}

i'm sorry about not find the root cause of this problem, i will study it again during the may day holiday

stxrlong commented 2 months ago

I'm sorry to tell you that this problem is not caused by your excellent lib,but by the memory allocation mechanism of Linux. There is a lot of memory malloc/free operations in my program,when the peak concurrent flow comes,the OS will hold the memory even through they were freed for the next reuse. I used tcmalloc to force them to be released to OS, the memory accumulation disappeared. Thanks for your patience and I'm sorry to bother you