sewenew / redis-plus-plus

Redis client written in C++
Apache License 2.0
1.64k stars 351 forks source link

[QUESTION] How to trigger reconnect to new master when only subscribing? #441

Closed oledahle closed 1 year ago

oledahle commented 1 year ago

I have a Redis master and replica, monitored by three sentinels, all as docker instances. When I stop the master, the sentinels quickly elect a new master. But my clients using redis-plus-plus Sentinel to get a Redis object and then subscribe to a pub/sub topic do not reconnect to the new master. The consume method catches "Connection is broken" exceptions from redis-plus-plus, but nothing more happens.

Client connection code very close to Readme example:

    sw::redis::SentinelOptions sentinel_opts;
    std::vector<std::pair<std::string, int>> nodes;
    sentinel_opts.nodes = { { "172.18.0.4", 5000 }, {"172.18.0.5", 5001}, {"172.18.0.6", 5002} }; // Required. List of Redis Sentinel nodes.

    // Optional. Timeout before we successfully connect to Redis Sentinel.
    // By default, the timeout is 100ms.
    sentinel_opts.connect_timeout = std::chrono::milliseconds(500);

    // Optional. Timeout before we successfully send request to or receive response from Redis Sentinel.
    // By default, the timeout is 100ms.
    sentinel_opts.socket_timeout = std::chrono::milliseconds(250);
    m_sentinel = std::make_shared<sw::redis::Sentinel>(sentinel_opts);

    sw::redis::ConnectionOptions connection_opts;
    connection_opts.user = "default";
    connection_opts.password = "";
    connection_opts.connect_timeout = std::chrono::milliseconds(200); // Required.
    connection_opts.socket_timeout = std::chrono::milliseconds(100);  // Required.
    connection_opts.db = 0; // Always select dabase 0

    sw::redis::ConnectionPoolOptions pool_opts;
    pool_opts.size = 3; // Optional. The default size is 1.

    // Get the master named "novamaster" from the Sentinel
    m_redis_master = std::make_unique<sw::redis::Redis>(
        m_sentinel, "novamaster", sw::redis::Role::MASTER, connection_opts, pool_opts);
     m_subscriber = std::make_unique<sw::redis::Subscriber>(m_redis_master->subscriber());
     m_subscriber->on_message(
            std::bind(&RedisClient::onRedisMessage, this, std::placeholders::_1, std::placeholders::_2));
    m_subscriber->subscribe("testtopic");

The consume method is something like this:

    void RedisClient::consume(sw::redis::Subscriber* subscriber)
    {
         while (!m_shuttingDown)
         {
             try
             {
                 subscriber->consume();
              }
              catch (const sw::redis::TimeoutError& e)
              {
                   // This can happen quite often, just try again.
                   continue;
              }
              catch (const sw::redis::Error& e)
              {
                  / / TODO: Handle other exceptions here!
                  std::cerr << "consume caught Redis exeption: " << e.what() << std::endl;
                  continue;
              }
         }
     } 

Expected behavior Redis object automatically connects to the new master after detecting failover.

Question Do I need to trigger a PING command from consume() to make redis-plus-plus detect the failover, and then recreate the subscription to make this work?

Environment: OS: Fedora 36 Compiler: GCC 12.2.1 Redis: 6.2.7 hiredis version: 1.0.2-2 redis-plus-plus version: 1.3.6

sewenew commented 1 year ago

The consume method catches "Connection is broken" exceptions from redis-plus-plus, but nothing more happens.

Yes, so far, if the connection is broken, you can no longer use the Subscriber any longer. Instead, you should create a new Subscriber. Check the doc for detail.

Do I need to trigger a PING command from consume() to make redis-plus-plus detect the failover, and then recreate the subscription to make this work?

You do not need to trigger PING, instead, if consume throws an exception other than TimeoutError or ReplyError, recreate a new Subscriber.

Regards

oledahle commented 1 year ago

Thanks for to the tip, I'll try it out.

wb2712 commented 1 year ago

connOpts.keep_alive = true;

########################### if windows , use new hisredis.

/ Enable connection KeepAlive. / int redisEnableKeepAlive(redisContext *c) { if (redisKeepAlive(c, REDIS_KEEPALIVE_INTERVAL) != REDIS_OK) return REDIS_ERR; return REDIS_OK; }

##################### but it Check every 30 seconds by default

wb2712 commented 1 year ago

redis++ 设置这个keep_alive = true; 建议加个时间 参数。

sewenew commented 1 year ago

@wb2712 Sorry, but hiredis does not expose such an API, i.e. redisKeepAlive, as public. So you'd better create an issue or PR with hiredis.

Regards

oledahle commented 1 year ago

For the record: Calling m_redis_master->subscriber() made the "parent" sentinel object switch over to the new master as soon as the election was complete and the new master had assumed its role.

wb2712 commented 1 year ago

sentinel_opts.keep_alive = true;

while (1) { try { sub->consume(); } catch (const TimeoutError& te) { continue; }catch (const Error& err) { // if set keep alive, will catch err, But the cycle is long break; } } // Reconnect other toReboot();

################################## It also depends on how the server exits, whether it exits normally or crashes. Similar to power-off or manual power-off, kill - 9 redis-server or kill - 15 redis-server. If kill - 9 exits, the service will be unavailable, which is not what the client can solve.

Which way is the master node offline constructed? Unplug the server network cable or disable the network card?

oledahle commented 1 year ago

For this testing, I stopped the Redis master by hitting CTRL+C in an interactive docker session, so the redis server process shut down in a controlled manner, and then the IP address disappeared as well. This produced the generic sw::redis::Error exception with the text "Connection is broken". I'm sure other shutdown / offline scenarios will give slightly different errors.