Closed despiegk closed 8 months ago
What can be done for this on operations level (require no code changes):
Redundancy that requires code changes (work in progress):
complete
separate instance (with it's own redis backend) if r0 is gone completely all peers can still communicate over r1. Federation is still possible from r0 and r1 so massages that are intended to peers on r0 should still be auto-routedRegarding Redis, we already have a 3 node Redis test-cluster running for testing Lee's Cetus DNS server (that serves ava.tf). Which works well and is documented here. For this we would require Redis to be exposed publicly over TLS? So any RMB relay can connect to it?
@coesensbert can't they use wireguard instead of exposing things publicly?
Yes for sure. That test setup has been configured with a Wireguard mesh currently. So we could do the same for these RMB relays. But this will have quite some overhead regarding scaling and won't fit in for example a validator running the whole backend stack (it will have it's own redis/rmb client I read). It was just to inform that if we want to expand RMB relays now, we could. But it seems best to make it fit into the broader picture of decentralizing the grid backend, so not to do extra work (setting up redis clusters + wireguard mesh) that we won't use later on.
So for ops, we could setup a redis cluster with wireguard fast if that would be needed. If this won't be used in the future then I suggest we work on a solution that better fits the future plans to decentralize the grid backend (as we started here).
You're 100% correct
there can be no clusters behind, otherwise we break decentralization
deadline: mid sept
The redis solution was not intended to be a solution for an across location redundancy solution, it's only in case we are running a multiple relay servers (in same location) against a single redis cluster.
RMB now supports redundancy across multiple locations by having multiple independent relays running anywhere (each with it's own redis backend which can be a single instance or a cluster)
https://github.com/threefoldtech/tf_operations/issues/1934
We already have a separte redis instance for devnet which we already can use for testing
@ramezsaeed please link us to how is this being tracked/verified