vacp2p / research

Thinking in code
MIT License
62 stars 4 forks source link

Architecture for data synchronization: chat2 and store protocol #73

Open oskarth opened 3 years ago

oskarth commented 3 years ago

Discussed in https://github.com/vacp2p/rfc/discussions/384

Originally posted by **staheri14** May 27, 2021 # Problem Definition The need for data synchronization exists in different domains of `nim-waku`: **Store protocol**: Currently, store nodes work independently without synchronizing the state of their persisted messages. As such, there might be situations that they may have different views of the historical messages. This view inconsistency also means that light nodes cannot rely on the completeness of the history provided by a single store node. We need a mechanism to synchronize the state of store nodes and to enable them to exchange their views and converge to a consistent and complete state. This will add reliability to the overall store protocol service. Moreover, any single full store node becomes a reliable source of message history. **Chat2**: A similar synchronization requirement exists in the chat2 application where the message view of clients may diverge due to network delay. Another use-case is the synchronization of two (offline) devices belonging to the same user. The focus this post is to present an architecture for data synchronization relying on MVDS. Using this architecture, a group of independent nodes each with a dynamic set of messages would be able to keep their message view consistent. In the following modular architecture, the synchronization problem has been broken into three parts _MVDS_, _Synchronization Orchestration Layer_, and _Peer Management Layer_. # MVDS At a very high level, consider MVDS as a protocol by which two nodes Alice and Bob with two sets of messages `A` and `B`, respectively, can communicate, synchronize and obtain `A UNION B`. At a very abstract level, what MVDS does is that: - It receives a list of messages i.e., `A` as input and synchronizes them with the other end of the protocol (makes sure `A-B` is received by Bob) - It outputs messages fetched from the other end of the protocol i.e., `B-A` # Synchronization Orchestration The orchestration layer is responsible to keep the node's _message state_ in sync with a (dynamic) set of peers `{P_1,..., P_N}`. It does so by periodically synchronizing with those peers via the MVDS protocol. The node's message state is a queue of messages denoted by `Q`. The Orchestration layer supports: - Add peer. - Remove peer. - Pause/Resume peer (Optional). - Set a synchronization interval for a Peer (interval to be defined). - Keeps all the peers synchronized with the latest state of `Q` . - Updates `Q` with messages fetched from each connected peer. - De-duplicates messages in `Q`. # Peer/Group Management The set of peers to be synchronized with must be dictated to the synchronization layer and this is the responsibility of the peer management layer. Note that the peer management unit is an independent component that can exist regardless of the other two layers. This layer must identify the group of peers that the node needs to synchronize with. It is expected that this layer keeps track of the updates on the group memberships and communicates those updates with the Synchronization Orchestration layer to add/pause/resume synchronization with the peers accordingly. The implementation of this layer can be as simple as maintaining a static list of peers. Or, may involve a complex group membership protocol. # Example use cases The following explains how to deploy this architecture in chat2 and store protocol. Note that the modularity of this architecture gives us design flexibility where we can devise various synchronization topologies which various communication complexities. ## Chat2 application In a very basic solution, we can imagine the peer management unit to be a static and known list of members. This list is passed to the the Synchronization Orchestration layer which in turn begins to synchronize with them. ## FT-Store: Store (full) nodes synchronization See the problem definition in https://github.com/status-im/nim-waku/issues/561 As the solution, a full store node can pass a list of other full store nodes to the Synchronization Orchestration layer to get in sync with. One big unknown here is that how to find other store nodes and how to make sure all the store nodes will converge to a consistent message view. One immediate solution is to assume a static list of full nodes and make them sync with every other node. This imposes O(N^2) communication complexity which is not ideal (specially considering that there might be a large number of store nodes) See below for the sketch of a more efficient solution. Its communication complexity is O(N). ### Solution Sketch We need to have a connected graph of store full nodes, where each node has a constant number of connections and periodically synchronizes with its connections using the Synchronization Orchestration layer. Such a connected graph allows eventual synchronization across all the store full nodes. How to construct this graph? by leveraging libp2p GossipSub protocol. #### Peer management We can have a GossipSub domain for store nodes synchronization by defining a pubsub topic e.g., `waku/2/store-sync/proto`. This would ultimately lead all the store nodes to find each other and form a mesh. Then, each node extracts the id of its direct connections within that mesh and passes them to the Synchronization Orchestration to synchronize with.

The followings are the next steps:

Implementing MVDS as a separate protocol (will use the old MVDS nim repo but have to clean it up and revise/debug/test the code) together with a raw specs (the prior specs may need some updates). Issuing up further tasks. Development of a basic orchestration layer with a static group management (nodes to be set through command-line) Integration with wakunode2


Further down the road, I will focus on the GossipSub-based peer management.