sewenew / redis-plus-plus

Redis client written in C++
Apache License 2.0
1.6k stars 347 forks source link

[BUG] redis-plus-plus core dump / crash at AsyncRedisCluster reset #578

Open jzkiss opened 2 months ago

jzkiss commented 2 months ago

Describe the bug AsyncRedisCluster reset causes coredump if one of the redis master was killed before.

To Reproduce [1.] asynch client is defined / used in the following way:

::std::shared_ptr<::sw::redis::AsyncRedisCluster> m_redis_cluster; m_redis_cluster.reset(new ::sw::redis::AsyncRedisCluster(opts, pool_opts, ::sw::redis::Role::MASTER));

[2.] Continuous traffic is generated

[3.] One redis master exits (kill -9 redis-server-pid or execute kubernetes rolling upgrade for the redis pods)

[4.] User code of redis-plus-plus detects that for 4 seconds there is no response for those requests that are directed to the unreachable redis (based on hash slot)

[5.] User code of redis-plus-plus initiates AsyncRedisCluster reset with ip-address / port of a reachable redis master m_redis_cluster.reset(new ::sw::redis::AsyncRedisCluster(opts, pool_opts, ::sw::redis::Role::MASTER));

[6.] after a ~0.6 sec (restart: 14:59:39.710346104Z core dump: 14:59:40.398587602Z) core dump is detected:

[New LWP 1407] [New LWP 1486] [New LWP 1484] [New LWP 1483] [New LWP 1485] [New LWP 1405] [New LWP 1404] [New LWP 1400] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `'. Program terminated with signal SIGABRT, Aborted.

0 0x0000000009625acf in raise () from /lib64/libc.so.6

[Current thread is 1 (Thread 0x122c1700 (LWP 1407))] ... (gdb) bt full

0 0x0000000009625acf in raise () from /lib64/libc.so.6

No symbol table info available.

1 0x00000000095f8ea5 in abort () from /lib64/libc.so.6

No symbol table info available.

2 0x0000000007e7d96a in uv_async_send.cold () from /lib64/libuv.so.1

No symbol table info available.

3 0x0000000007c31856 in sw::redis::AsyncConnection::send (this=0xe82ee20, event=std::unique_ptr = {...})

at /usr/include/c++/8/bits/shared_ptr_base.h:251

No locals.

4 0x0000000007c42d96 in sw::redis::AsyncShardsPool::_redeliver_events (this=0xfdbbf90,

events=std::queue wrapping: std::deque with 6 elements = {...}) at /usr/include/c++/8/bits/move.h:74
    async_event = <optimized out>
    pool = std::shared_ptr<sw::redis::AsyncConnectionPool> (use count 3, weak count 1) = {get() = 0xfe40990}
    connection = {_pool = std::shared_ptr<sw::redis::AsyncConnectionPool> (use count 3, weak count 1) = {get() = 0xfe40990},
      _connection = std::shared_ptr<sw::redis::AsyncConnection> (use count 3, weak count 1) = {get() = 0xe82ee20}}
    event = <optimized out>
    should_stop_worker = false

5 0x0000000007c44530 in sw::redis::AsyncShardsPool::_run (this=0xfdbbf90)

at /.../redis++/rpm/BUILD/src/sw/redis++/async_shards_pool.cpp:191
    events = std::queue wrapping: std::deque with 6 elements = {{key = "USER_KEY_297",
        event = std::unique_ptr<sw::redis::AsyncEvent> = {get() = 0x0}}, {key = "USER_KEY_302",
        event = std::unique_ptr<sw::redis::AsyncEvent> = {get() = 0xe8b2d30}}, {
        key = "USER_KEY_303", event = std::unique_ptr<sw::redis::AsyncEvent> = {
          get() = 0xe8a38a0}}, {key = "USER_KEY_522",
        event = std::unique_ptr<sw::redis::AsyncEvent> = {get() = 0xe8d7a60}}, {
        key = "USER_KEY_306", event = std::unique_ptr<sw::redis::AsyncEvent> = {
          get() = 0xe6d7e20}}, {key = "", event = std::unique_ptr<sw::redis::AsyncEvent> = {get() = 0xe6c8dc0}}}

6 0x0000000008d6ab23 in execute_native_thread_routine () from /lib64/libstdc++.so.6

No symbol table info available.

7 0x00000000083591ca in start_thread () from /lib64/libpthread.so.0

No symbol table info available.

8 0x0000000009610e73 in clone () from /lib64/libc.so.6

No symbol table info available. (gdb) Quit

USER_KEY_297, ..., USER_KEY_306 are anonymized keys, but all of them belongs to the slot range of the killed redis master

Expected behavior No crash, traffic should be stabilized.

Environment: OS: Rocky Linux 8.2-20.el8.0.1 Compiler: gcc version 8.5.0 hiredis version: hiredis 1.2.0 redis-plus-plus version: 1.3.12

Additional context Redis cluster is used with 3 masters and 3 slaves.