need RMB to be redundant

threefoldtech / home

Starting point for the threefoldtech organization

https://threefold.io

Apache License 2.0

9 stars 4 forks source link

need RMB to be redundant #1458

Closed despiegk closed 8 months ago

despiegk commented 11 months ago

minimal redundancy needs to be achieved on RMB level (implement in most simple way)
rmb server needs to be on trusted locations

xmonader commented 11 months ago

https://github.com/threefoldtech/tf_operations/issues/1797

muhamadazmy commented 11 months ago

What can be done for this on operations level (require no code changes):

Make sure redis is a redundant cluster not a single instance
RMB service is by itself stateless, it relies on backend redis for message buffers, hence we can have multiple instance of the rmb relay running against the single redis cluster so increasing capacity and high availability.
for example relay.grid.tf can be load balanced against multiple rmb processes, that are all using the same redis cluster

Redundancy that requires code changes (work in progress):

Allow peers to maintain multiple connections to different relays (say r0.grid.tf, r1.grid.tf) where each instance of the relay is a complete separate instance (with it's own redis backend) if r0 is gone completely all peers can still communicate over r1. Federation is still possible from r0 and r1 so massages that are intended to peers on r0 should still be auto-routed

coesensbert commented 10 months ago

Regarding Redis, we already have a 3 node Redis test-cluster running for testing Lee's Cetus DNS server (that serves ava.tf). Which works well and is documented here. For this we would require Redis to be exposed publicly over TLS? So any RMB relay can connect to it?

xmonader commented 10 months ago

@coesensbert can't they use wireguard instead of exposing things publicly?

coesensbert commented 10 months ago

Yes for sure. That test setup has been configured with a Wireguard mesh currently. So we could do the same for these RMB relays. But this will have quite some overhead regarding scaling and won't fit in for example a validator running the whole backend stack (it will have it's own redis/rmb client I read). It was just to inform that if we want to expand RMB relays now, we could. But it seems best to make it fit into the broader picture of decentralizing the grid backend, so not to do extra work (setting up redis clusters + wireguard mesh) that we won't use later on.

So for ops, we could setup a redis cluster with wireguard fast if that would be needed. If this won't be used in the future then I suggest we work on a solution that better fits the future plans to decentralize the grid backend (as we started here).

xmonader commented 10 months ago

You're 100% correct

despiegk commented 10 months ago

there can be no clusters behind, otherwise we break decentralization

deadline: mid sept

muhamadazmy commented 9 months ago

The redis solution was not intended to be a solution for an across location redundancy solution, it's only in case we are running a multiple relay servers (in same location) against a single redis cluster.

RMB now supports redundancy across multiple locations by having multiple independent relays running anywhere (each with it's own redis backend which can be a single instance or a cluster)

https://github.com/threefoldtech/tf_operations/issues/1934

We already have a separte redis instance for devnet which we already can use for testing

xmonader commented 8 months ago

@ramezsaeed please link us to how is this being tracked/verified