ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.03k stars 5.59k forks source link

[Core] Ray GCS FT should support redis sentinel #46983

Open kanwang opened 1 month ago

kanwang commented 1 month ago

Description

redis sentinel is a lightweight solution for running highly available redis. It manages automatic failover across primary and secondary replicas of redis. Consider how important redis is to GCS FT, redis sentinel seems to be a good fit for supporting a highly available Ray cluster.

Right now, it looks like the redis client isn't support sentinel urls.

Use case

Using redis sentinel directly to support GCS FT.

jjyao commented 1 month ago
// Find the true leader
  std::vector<const char *> argv;
  std::vector<size_t> argc;
  std::vector<std::string> cmds = {"DEL", "DUMMY"};
  for (const auto &arg : cmds) {
    argv.push_back(arg.data());
    argc.push_back(arg.size());
  }

  auto redis_reply = reinterpret_cast<redisReply *>(
      ::redisCommandArgv(context_.get(), cmds.size(), argv.data(), argc.data()));

  if (redis_reply->type == REDIS_REPLY_ERROR) {
    // This should be a MOVED error
    // MOVED 14946 10.xx.xx.xx:7001
    std::string error_msg(redis_reply->str, redis_reply->len);
    freeReplyObject(redis_reply);
    auto maybe_ip_port = ParseIffMovedError(error_msg);
    RAY_CHECK(maybe_ip_port.has_value())
        << "Setup Redis cluster failed in the dummy deletion: " << error_msg;
    Disconnect();
    const auto &[ip, port] = maybe_ip_port.value();
    // Connect to the true leader.
    RAY_LOG(INFO) << "Redis cluster leader is " << ip << ":" << port
                  << ". Reconnect to it.";
    return Connect(ip, port, password, enable_ssl);
  } else {
    RAY_LOG(INFO) << "Redis cluster leader is " << ip_addresses[0] << ":" << port;
    freeReplyObject(redis_reply);
  }

We use the above code to find the primary replica to talk to. Is this not enough?

kanwang commented 1 month ago

yeah we tested it. it doesn't work. redis-ha/redis-sentinel protocol is different from redis cluster. when a readonly-replica get write request, it will not return a MOVED error. instead it will simply say READONLY You can't write against a read only replica. Below is the logs we got during test:

[2024-07-29 12:50:31,333 C 8 8] (ray_init) redis_context.cc:522:  Check failed: parts[0] == "MOVED" && parts.size() == 3 Setup Redis cluster failed in the dummy deletion: READONLY You can't write against a read only replica.

Sentinel typically comes with a slightly different protocol. it provides some redis command to specifically request for current master. For example, this is how it's handled in a python client: https://redis-py.readthedocs.io/en/v4.1.2/connections.html#sentinel-client

looks like hiredis doesn't support something like this, so we probably need some similar-ish logic like above to find current master.

kanwang commented 1 month ago

I guess a semi-related question: do you foresee the interface for GCS fault tolerance get changed in short term? if not, I can probably try implement the sentinel protocol in redis_context. but if there could be major change soon, we'd be interested to be able to plugin in more durable storage (e.g. kv or a relation db).