sewenew / redis-plus-plus

Redis client written in C++
Apache License 2.0
1.64k stars 351 forks source link

[BUG] Improper error when hiredis libuv expiration happens #444

Closed leonliao closed 1 year ago

leonliao commented 1 year ago

Describe the bug I use redis++ 1.3.7 with hiredis 1.1.0 to connect to a Redis Cluster. Hiredis 1.1.0 added libuv expiration support by fixing this issue https://github.com/redis/hiredis/issues/1015. When libuv expiration happens, "connection has been closed" error was thrown in ClusterEvent

#0  sw::redis::ClusterEvent<bool, sw::redis::fmt::SetResultParser>::_cluster_reply_callback (r=0x0, privdata=0x6e3400)
    at redis-plus-plus/include/sw/redis++/async_connection.h:530
#1  0x00000000004854e0 in __redisRunCallback (ac=0x7fffe8002010, cb=0x7ffff6cbb990, reply=0x0) at async.c:311
#2  0x000000000048675b in redisAsyncHandleTimeout (ac=0x7fffe8002010) at async.c:805
#3  0x000000000042cbdc in redisLibuvTimeout (timer=0x7fffe80013e8) at hiredis/include/hiredis/adapters/libuv.h:111
#4  0x000000000045cc7e in uv.run_timers ()
#5  0x0000000000462234 in uv_run ()
#6  0x000000000042ce9a in sw::redis::EventLoop::<lambda()>::operator()(void) const (__closure=0x6e2558)
    at redis-plus-plus/src/sw/redis++/event_loop.cpp:32
#7  0x000000000042e6d7 in std::__invoke_impl<void, sw::redis::EventLoop::EventLoop()::<lambda()> >(std::__invoke_other, sw::redis::EventLoop::<lambda()> &&) (__f=...)
    at /usr/include/c++/8/bits/invoke.h:60
#8  0x000000000042e548 in std::__invoke<sw::redis::EventLoop::EventLoop()::<lambda()> >(sw::redis::EventLoop::<lambda()> &&) (__fn=...) at /usr/include/c++/8/bits/invoke.h:95
#9  0x000000000042e8f8 in std::thread::_Invoker<std::tuple<sw::redis::EventLoop::EventLoop()::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x6e2558)
    at /usr/include/c++/8/thread:244
#10 0x000000000042e8ce in std::thread::_Invoker<std::tuple<sw::redis::EventLoop::EventLoop()::<lambda()> > >::operator()(void) (this=0x6e2558) at /usr/include/c++/8/thread:253
#11 0x000000000042e8b2 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<sw::redis::EventLoop::EventLoop()::<lambda()> > > >::_M_run(void) (this=0x6e2550)
    at /usr/include/c++/8/thread:196
#12 0x00007ffff78fdb23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#13 0x00007ffff70892de in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffff6dba133 in clone () from /lib64/libc.so.6

From the stack trace, if a command event in libuv queue expires before sending to network, the callback redisAsyncHandleTimeout will be invoked, and null reply is passed. So, a nullptr in reply in _cluster_reply_callback doesn't necessary mean connection has been closed.

Treating this type of expiration as error, the Redis connection will be cleaned up as well, causing too many reconnects. And that has been observed by wireshark network captures.

To Reproduce

#include <sw/redis++/async_redis++.h>

#include <iostream>
#include <future>
#include <unistd.h>

using namespace sw::redis;
using namespace std;

int main(int argc, char **argv)
{
    if (argc < 4)
    {
        std::cout << argv[0] << " host port request_count" << std::endl;
        return -1;
    }
    ConnectionOptions opts;
    opts.host = argv[1];
    opts.port = atoi(argv[2]);
    uint64_t request_count = atol(argv[3]);
    opts.connect_timeout = std::chrono::milliseconds(200); 
    opts.socket_timeout = std::chrono::milliseconds(5);//makes the request timeout soon

    ConnectionPoolOptions pool_opts;
    pool_opts.size = 2;

    auto async_cluster = AsyncRedisCluster(opts, pool_opts);
    try
    {
        while(true) {
            Future<bool> set_res = async_cluster.set("key4", "val2");
            cout << set_res.get() << endl;
        }
    }
    catch (...)
    {
        cout << "set err" << endl;
    }
    cout << "set ok...." << endl;
    return 0;
}

Expected behavior

Environment:

Additional context NA

sewenew commented 1 year ago

Treating this type of expiration as error, the Redis connection will be cleaned up as well, causing too many reconnects

Even if we can distinguish these two kinds of errors, we have to close the connection when the connection is timeout.

Because when a connection is timed-out, we've no idea whether the request has been sent to Redis. If we don't close the connection, but reuse it, when we send another command to Redis, we might get reply from a former command:

  1. Client sends the first command to Redis.
  2. Redis receives the first request. However, before sending the response back to client, the connection is timed-out. Client throws a TimeoutError, and does not read from the connection to get the reply for the first request.
  3. Client reuses the connection, and reuse it to send a second request.
  4. Client tries to read from the connection to get the reply of the second request. Oops, it gets the response of the first request.

In a word, we have to close the connection, when it's timed-out. In order to avoid closing the connection frequently, in your case, you need to set a larger socket_timeout.

Regards

sewenew commented 1 year ago

Since there's no update, I'll close this issue.

Regards

leonliao commented 1 year ago

It is possible commands expire in the libuv queue, before sending to network. That could be caused by

  1. either the libuv thread is not scheduled somehow
  2. or Redis connection is slow.

For reason 1, it is NOT possible to read responses of other commands which have been sent to Redis? I suggest that for this type of expiration, it is OK to not close the connection. Of course, the reason of expiration must be distinguishable.

sewenew commented 1 year ago

First of all, I’m not sure if libuv support distinguishing these two kinds of timeout. Secondly, even if it has such support, currently, hiredis doesn’t have an interface to do that. Since redis-plus-plus is based on hiredis, we can not support the feature you proposed.

So sorry for that, and I will keep an eye on this problem. If you have any new ideas on it, feel free to let me know.

Regards


发件人: leonliao @.> 发送时间: Tuesday, January 31, 2023 10:12:54 AM 收件人: sewenew/redis-plus-plus @.> 抄送: sewenew @.>; State change @.> 主题: Re: [sewenew/redis-plus-plus] [BUG] Improper error when hiredis libuv expiration happens (Issue #444)

It is possible commands expire in the libuv queue, before sending to network. That could be caused by

  1. either the libuv thread is not scheduled somehow
  2. or Redis connection is slow.

For reason 1, it is NOT possible to read responses of other commands which have been sent to Redis? I suggest that for this type of expiration, it is OK to not close the connection. Of course, the reason of expiration must be distinguishable.

― Reply to this email directly, view it on GitHubhttps://github.com/sewenew/redis-plus-plus/issues/444#issuecomment-1409641708, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACWWTAPDEFZULMNXF7OZZH3WVBYKNANCNFSM6AAAAAATYXZKFM. You are receiving this because you modified the open/close state.Message ID: @.***>

leonliao commented 1 year ago

OK. This is not a critical issue. Let's keep it as it is now.