Open staltz opened 1 year ago
Then I repeat the question because I don't understand how you cope with the general case with what you are currently implementing.
On the various global parameters for the anchor algorithm: In various places above are mentioned the following points at which an anchor is generated:
I presume these are joined by an "or" operator. Based on later discussion it seems like anchors are an accepted concept for SSB2. Questions related to these parameters:
Personaly I don't think the number of messages is useful as threshold.
For the period, I think something like 3 months is largely enough. The main driving threshold should the amount of data to replicate.
The algorithm described in https://github.com/ssbc/ssb2-discussion-forum/issues/16#issuecomment-1491519470 so that in addition to a « wish » date in the past, peers share a wished maximum amount of data per identity. So that the connected peers can determine the starting anchors to take using both informations.
I think those params can be defined per context type in a spec registry. We most often think term of the microblogging feeds, but if we use several CTX as proposed in that thread, we can be smarter. For instance, for a context ´social-contact’, if we compact a snapshot’ of changes since previous anchor with the anchors, the best is to always take last anchor know as ref during replication and it will be probably the period threshold that will guide the emission of anchors.
For the case 1. In my head, and if understood well, this is what you explained. When a user post a message on its app, the app check first if the period since last anchor is not expired. If expired, emit anchor before the new message. Else check amount data accumulated since last anchor if you add this message. If over the threshold, emit an anchor and then the message. Else just emit the message.
For point 2. I partially gave my opinion with the « registry of context type » as part of the spec containing parameters. For my personal simulations, in real feed and on generated feeds, I think that thresholds can be much larger than weeks and a few KB. I think something like 3 months and 1MB is small enough granularity for ´social-post' context given the requirements of total footprint given by @staltz
I want to have a dedicated thread for this topic, since it's complex.
Some comments from #7 by @gpicron:
I'm concerned about requiring the first message in a feed, because that means we cannot do sliced replication (see a glossary below). Sliced replication is a MUST in SSB2, so we can forget the past.
But even if we don't require the first message, replicating the DAG might still be difficult with the algorithms we dreamed so far.
Suppose Alice has (just for conversation's sake, let's use
sequence
numbers) messages 50–80 from Carol's feed, and Bob has messages 60–90 from Carol's feed. So when Alice syncs with Bob, she wants messages 81–90 from Bob, but if Bob uses a merkle tree or a bloom clock stamp, then he'll do that from 60–90 and Alice will do that from 50–80. They have different starting points.Worse, let's consider the case where Alice has 50–80 and Bob has 60–80. They are both up-to-date with each other, and nothing needs to be sent over the wire. But their bloom clock stamps and merkle trees are going to be different just because of the starting point being different.
With linear feeds (and sequence numbers), it's easy to just send these starting points and end points, but with a DAG we don't have sequence numbers. So what representation are we going to use to compress "my DAG" and "their DAG" and compare them?
Glossary
[lowerBound, upperBound]
slice/range of a certain feed