[3.] One redis master exits (kill -9 redis-server-pid or execute kubernetes rolling upgrade for the redis pods)
[4.] User code of redis-plus-plus detects that for 4 seconds there is no response for those requests that are directed to the unreachable redis (based on hash slot)
[5.] User code of redis-plus-plus initiates AsyncRedisCluster reset with ip-address / port of a reachable redis master
m_redis_cluster.reset(new ::sw::redis::AsyncRedisCluster(opts, pool_opts, ::sw::redis::Role::MASTER));
[6.] after a ~0.6 sec (restart: 14:59:39.710346104Z core dump: 14:59:40.398587602Z) core dump is detected:
[New LWP 1407]
[New LWP 1486]
[New LWP 1484]
[New LWP 1483]
[New LWP 1485]
[New LWP 1405]
[New LWP 1404]
[New LWP 1400]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `'.
Program terminated with signal SIGABRT, Aborted.
0 0x0000000009625acf in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x122c1700 (LWP 1407))]
...
(gdb) bt full
0 0x0000000009625acf in raise () from /lib64/libc.so.6
No symbol table info available.
1 0x00000000095f8ea5 in abort () from /lib64/libc.so.6
No symbol table info available.
2 0x0000000007e7d96a in uv_async_send.cold () from /lib64/libuv.so.1
No symbol table info available.
3 0x0000000007c31856 in sw::redis::AsyncConnection::send (this=0xe82ee20, event=std::unique_ptr = {...})
at /usr/include/c++/8/bits/shared_ptr_base.h:251
No locals.
4 0x0000000007c42d96 in sw::redis::AsyncShardsPool::_redeliver_events (this=0xfdbbf90,
Describe the bug AsyncRedisCluster reset causes coredump if one of the redis master was killed before.
To Reproduce [1.] asynch client is defined / used in the following way:
::std::shared_ptr<::sw::redis::AsyncRedisCluster> m_redis_cluster; m_redis_cluster.reset(new ::sw::redis::AsyncRedisCluster(opts, pool_opts, ::sw::redis::Role::MASTER));
[2.] Continuous traffic is generated
[3.] One redis master exits (kill -9 redis-server-pid or execute kubernetes rolling upgrade for the redis pods)
[4.] User code of redis-plus-plus detects that for 4 seconds there is no response for those requests that are directed to the unreachable redis (based on hash slot)
[5.] User code of redis-plus-plus initiates AsyncRedisCluster reset with ip-address / port of a reachable redis master m_redis_cluster.reset(new ::sw::redis::AsyncRedisCluster(opts, pool_opts, ::sw::redis::Role::MASTER));
[6.] after a ~0.6 sec (restart: 14:59:39.710346104Z core dump: 14:59:40.398587602Z) core dump is detected:
[New LWP 1407] [New LWP 1486] [New LWP 1484] [New LWP 1483] [New LWP 1485] [New LWP 1405] [New LWP 1404] [New LWP 1400] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `'. Program terminated with signal SIGABRT, Aborted.
0 0x0000000009625acf in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x122c1700 (LWP 1407))] ... (gdb) bt full
0 0x0000000009625acf in raise () from /lib64/libc.so.6
No symbol table info available.
1 0x00000000095f8ea5 in abort () from /lib64/libc.so.6
No symbol table info available.
2 0x0000000007e7d96a in uv_async_send.cold () from /lib64/libuv.so.1
No symbol table info available.
3 0x0000000007c31856 in sw::redis::AsyncConnection::send (this=0xe82ee20, event=std::unique_ptr = {...})
No locals.
4 0x0000000007c42d96 in sw::redis::AsyncShardsPool::_redeliver_events (this=0xfdbbf90,
5 0x0000000007c44530 in sw::redis::AsyncShardsPool::_run (this=0xfdbbf90)
6 0x0000000008d6ab23 in execute_native_thread_routine () from /lib64/libstdc++.so.6
No symbol table info available.
7 0x00000000083591ca in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
8 0x0000000009610e73 in clone () from /lib64/libc.so.6
No symbol table info available. (gdb) Quit
USER_KEY_297, ..., USER_KEY_306 are anonymized keys, but all of them belongs to the slot range of the killed redis master
Expected behavior No crash, traffic should be stabilized.
Environment: OS: Rocky Linux 8.2-20.el8.0.1 Compiler: gcc version 8.5.0 hiredis version: hiredis 1.2.0 redis-plus-plus version: 1.3.12
Additional context Redis cluster is used with 3 masters and 3 slaves.