Sync store baseline understanding

ABresting commented 7 months ago

Sync store is a vital feature of Waku protocol where a node can synchronize with peer-nodes hoping to get missing messages while the node was offline/out-of-activity. Every message in Waku protocol can be uniquely identified using a messageHash, which is a DB attribute. Using the messageHash it gets easier for nodes to identify if their store has that certain message. The following are the potential features of the Waku store sync:

Sync request can be triggered:
- when a node boots/powers-on
- if the last received message elapsed X time for eg. 5 mins
- client-based manual trigger
Sync request is passive, i.e. only a node missing the data/messages triggers it, the provider node should not actively trigger/advertise it
An outdated client (no provision to support such req.) node when receives a Sync request, sends 501 (Not Implemented) or 405 (Method Not Allowed) status code
Upon agreeing on the missing hashes, the provider node should prepare to transport the messages to the requesting peer node

There are some open questions such as:

Things to consider when implementing the Sync mechanism #63
How older Waku messages are eligible to be Sync'ed using peer-nodes? #64

Eventually, after establishing the understanding and operating details of the Prolly tree-based Synchronization mechanism, the integration of the Synchronization layer into the Waku protocol requires careful consideration, ensuring a deep understanding of its operational nuances and a thoughtful approach to its implementation. #73

Waku Store and Sync Protocol is discussed here #76
Prolly Tree and Waku Message related discussion where how to get messages is discussed here #77

Topics such as incentives to serve sync requests are kept out of this document's scope.

ABresting commented 7 months ago

@jm-clius @waku-org/waku

SionoiS commented 7 months ago

https://github.com/waku-org/pm/issues/101#issuecomment-1819340376

SionoiS commented 7 months ago

Also, I would like to say that we should aim for a solution geared towards specific apps.

I believe that apps using TWN will naturally form sync groups among themselves. Meaning an App would have couple of TWN nodes but only sync messages it cares about.

Supporting that should be our first priority IMO.

Only then should we think about general store provider that would store all message because it would be a more general use case.

ABresting commented 7 months ago

Also, I would like to say that we should aim for a solution geared towards specific apps.

I believe that apps using TWN will naturally form sync groups among themselves. Meaning an App would have couple of TWN nodes but only sync messages it cares about.

Oh yes 100%, That's also what I have figured from Status way of functioning, XMTP implementation, Tribes requirements and a nice brainstorming session with @chaitanyaprem!

Supporting that should be our first priority IMO.

Only then should we think about general store provider that would store all message because it would be a more general use case.

I am wondering if we should let client somehow provide configuration parameter that allows it to make a Prolly tree (or some other Sync mechanism) based on content topic since most of the client nodes will be interested in their content topic that serves their apps.

SionoiS commented 7 months ago

I am wondering if we should let client somehow provide configuration parameter that allows it to make a Prolly tree (or some other Sync mechanism) based on content topic since most of the client nodes will be interested in their content topic that serves their apps.

If the Sync mechanism is Prolly tree based, a sync request becomes a set diff. The diff of the 2 local trees becomes the hash list of message to send to the other node, it's beautifully symmetric!

jm-clius commented 7 months ago

Thanks for opening up this issue, @ABresting!

A couple of comments:

Sync request can be triggered

At some point we may want to periodically sync while the node is online too, ensuring less fragmented histories due to unnoticed down periods or other short lapses in good connectivity.

Sync request is passive

This seems fine for now as a simple evolution of Store requests and responses. If we build a sync mechanism that periodically syncs, though, we may want to take inspiration from GossipSub's IHAVE and IWANT mechanisms where nodes also periodically advertises which messages they HAVE and others request what they WANT (fewer round trips)

outdated client...when receives a Sync request

In the simplest version of this protocol, I envision it could simply be a better Store protocol, with HistoryQuery either for a list of message hashes or the full contents belonging to such message hashes? In this case, if the other node doesn't support this version of the Store protocol, libp2p would fail to establish a protocol stream (dial failure). This happens before the service-side can respond with an error code within the protocol.

jm-clius commented 7 months ago

One thing that is important for the baseline understanding is to consider the layered architecture here and where the synchronisation mechanism lives:

Option 1: Store protocol layer

The Store protocol itself can evolve to exchange information about keys (message hashes) and full message contents. However, the store node would still need to be able to determine which hashes it's missing and request the full contents for these from other store nodes. In the simplest, but most inefficient, version of such an architecture, the Store node would have to query its own archive backend (the key-value store, which is likely a DB such as postgres) for a full list of keys and compare this with a full list of keys it receives from other nodes (who are doing the same inefficient DB queries).

However, if we introduce some efficient "middle layer" here between the DB/archive backend and the Store protocol, we could vastly improve the efficiency of doing a "diff" between the indexes/message hashes known to both nodes. The Store protocol would still be responsible for communicating which message hashes it knows about, comparing it to those known by other nodes and finding what's missing, but with an efficient way to compare its own history with those in other nodes. One such method is building efficient search trees, such as the Prolly trees described here: https://docs.canvas.xyz/blog/2023-05-04-merklizing-the-key-value-store.html The archive would remain the persistence layer underlying all of this - any DB/storage/persistence technology that is compatible with key-value storage.

Option 2: New middleware, synchronised "backend" for Store

With this option, we will not change the Store protocol - it will remain a way for clients to query the history stored in Store service nodes according to a set of filter criteria. However, the Store nodes themselves would build on some synchronised mechanism with its own protocol for synchronising between nodes (e.g. GossipLog based on Prolly Trees). The archive would remain the persistence layer where the synchronised messages are inserted and retrievable when queried.

Option 3: Synchronised backend/archive

In this option the Store protocol would not have to be modified and we won't need to introduce any "middleware" to effect synchronisation, messageHash exchange, etc. Instead, the Store protocol would assume that it builds on top of a persistence layer that handles synchronisation between instances. For example, all Store nodes could somehow persist and query messages from a Codex-backed distributed storage for global history with reliability and redundancy guarantees. A simpler analogy would be if all Store nodes somehow have access to the same Postgresql instance and simply write/query from there.

jm-clius commented 7 months ago

If the Sync mechanism is Prolly tree based, a sync request becomes a set diff. The diff of the 2 local trees becomes the hash list of message to send to the other node, it's beautifully symmetric!

I like this!

ABresting commented 7 months ago

Weekly Update

achieved: Clarity on Store sync protocol, nearly finalized (creating visual images/diagrams) the research document to explain architecture and issues with potential approaches.
next: prepare the workshop for Store sync and publish the research document on prolly tree with Waku use case.

ABresting commented 6 months ago

Weekly Update

achieved: baseline clarity on how and what Sync protocol will be done, supplementary Waku node parts interaction document.
next: a workshop with team folks to reach an agreement on how Sync store will look like.

ABresting commented 6 months ago

Weekly Update

achieved: PoC of Prolly Tree (fixing a Bug), insertion and deletion of data into it.
next: a writeup about Prolly trees PoC in issue, further testing, generating some operational data details such as memory consumption using RLN specs.

ABresting commented 6 months ago

Weekly Update

achieved: PoC of Prolly Tree feature complete, Postgres retention policy PR, diff protocol ground work started.
next: pending technical writeup about Prolly trees PoC in issue, Diff protocol, generating some operational data details such as memory consumption using RLN specs.

ABresting commented 6 months ago

Weekly Update

achieved: 1-day work this week due to time off, nim implementation of Prolly trees
next: Diff protocol discussion, Sync mechanism on wire query protocol discussion, generating some operational data details such as memory consumption using RLN specs.

waku-org / research