waku-org / research

Waku Protocol Research
MIT License
3 stars 0 forks source link

Waku Archive and Synchronization protocol #76

Open ABresting opened 6 months ago

ABresting commented 6 months ago

Waku Archive and Synchronization protocol

The Synchronization (Sync) protocol ensures that all nodes in the Waku network have consistent data and messages. It operates between different nodes: a node sends a Sync request to a peer node, which then processes the request and initiates the synchronization. A key step in this process involves the peer node sending a collection of messageHashes to the requesting node. The requesting node then reviews these hashes and identifies any it doesn't possess. It sends back a list of these missing messageHashes, prompting the peer node to transmit the corresponding messages. This system maintains data consistency across the network.

To effectively execute the Sync process, the Waku store protocol needs to incorporate certain capabilities:

The existing Waku archive protocol offers a getMessages function, utilizing parameters like contentopic, pubsubtopic, starttime, endtime etc. It's necessary for the Archive protocol to enhance its functionality to review a list of messageHashes and identifies which ones the node doesn't have already.

The following are functionalities Sync protocol would need from Waku Store:

@Ivansete-status @jm-clius would like to have your input on this.

Ivansete-status commented 6 months ago

Hey @ABresting! Thanks for that! :100: Let me share my point of view:

  1. The Sync protocol should only operate among nodes with Store protocol mounted.
  2. It would be interesting to have a graphical description of the protocol in different scenarios (node start && node started for some time.)
  3. We will need to define a sync-heartbeat that will perform the Sync "IHave / IWant" periodically.
jm-clius commented 6 months ago

Thanks! I think the main question here should rather be how the Store wire protocol (the protobuf) itself should change to support the new key-value approach to Store. Do we need any new types of historical message queries, new types of filter criteria, etc? It seems to me that this "query language" need to be expanded to include use cases such as:

  1. query Store for message hashes matching filter criteria
  2. query Store for full contents matching a list of message hashes

This would, of course, imply some changes in what the Archive supports, but it should flow naturally from the concepts introduced by the new Store protocol which is difficult to reason about without thinking through these protocol flows. For example sendMessageUsingHashes - this is not something the Archive should do and belongs to the Store protocol. The Archive is simply the interface and backend for the physical storage of historical messages and is unaware of peers, request-response, etc.

SionoiS commented 6 months ago

A key step in this process involves the peer node sending a collection of messageHashes to the requesting node. The requesting node then reviews these hashes and identifies any it doesn't possess.

The Sync process result IS the list of hashes. There is no extra round-trip and the list is the missing hashes no filtering needed. Maybe rephrase?

Also, Archive != Store, Archive is not a protocol (yet). Maybe we should not change Store and create the archive protocol instead OR make it part of the Sync protocol?

ABresting commented 6 months ago

The Sync process result IS the list of hashes. There is no extra round-trip and the list is the missing hashes no filtering needed. Maybe rephrase?

This goes into Store protocol provided features then.

Also, Archive != Store, Archive is not a protocol (yet). Maybe we should not change Store and create the archive protocol instead OR make it part of the Sync protocol?

I understand what you mean, sometimes it is tricky since Archive as @jm-clius rightly put it is an interface to physical storage. For eg here the store is used as interface to Archive like a sub-layer inside the Store. IMO in its present state Archive should not be termed as a protocol.