Sliced replication issue

staltz commented 1 year ago

I hit a problem with sliced replication and I'm not sure how to solve this. Seems difficult.

gantt
    title Carol's feed
    dateFormat  YYYY-MM-DD
    axisFormat %W
    section Alice
    Has :2022-01-01, 10w
    section Bob
    Has:b1, 2022-05-01, 5w

Suppose Alice wants to always replicate the latest 10 msgs of Carol's feed, and Bob wants to replicate the latest 5 msgs of Carol's feed. (They have different parameters, or Bob's database is more full, forcing him to store less msgs per feed, whatever)

In this case, Bob has more recent data than Alice has, so Alice is forced to delete the older 10 msgs, and download all the 5 msgs from Bob. This is to prevent a situation of a hole in the feed, which breaks sigchain validation. In the worst case, Bob could be maliciously doing this, to force Alice to delete content.

Now Alice has less recent msgs than she would like to have. She is forced to have as much data as Bob has. At scale, this means that every peer conducting sliced replication will end up having as much data as the peer who has the smallest amount of data.

The options I can see so far are:

Alice deletes her previous range if the new range would cause a hole
- Downside: Alice's new range is smaller than what she wants it to be
Alice preserves her previous range and downloads the new range
- Downside: a sigchain hole remains, needs to be eventually filled or validated (?). This may end up nullifying the benefits of sigchains entirely
Alice rejects the new range if it would cause a hole with the previous range
- Downside: Alice might get stuck and never get new updates

Help, someone?

arj03 commented 1 year ago

This is a very real case that should be supported I think. In my opinion alice should get the 5 messages from Bob and when she connects to another peer should try to get the previous 5 she is missing. Instead of deleting all 10 of the old messages she could just delete the oldest 5. You could end up in a situation where it was 1 instead of 5 so that would look like a "I nuke you database" attack.

staltz commented 1 year ago

Oh, nice that you read this already!

I was reading things elsewhere, and well, lipmaalinks would help. :upside_down_face:

All peers performing sliced replication could delete old msgs except those required for lipmaa link validation (called certificate pool), which are logarithmically few of them. To put this into perspective, if you have a feed that is very large (for instance a very active user posting 2000 msgs per year, for 10 years) that's sequence number 20000. You can delete almost everything, as long as you keep approximately 10 old messages around. That's very little. To be precise, about 10kB. Small price to pay.

So something like this BEFORE replication:

gantt
    title Carol's feed
    dateFormat YYYY-MM-DD
    axisFormat %W

    section Alice
    Has :aF, 2000-1, 10w

    section Bob
    Has:b1, 2000-01-01, 1w
    Has:b4, 2000-01-22, 1w
    Has:b13, 2000-03-26, 1w
    Has:bF, 2000-04-23, 5w

And this AFTER Alice replicates:

gantt
    title Carol's feed
    dateFormat YYYY-MM-DD
    axisFormat %W

    section Alice
    Has :aF, 2000-01-01, 10w
    Has :a13, 2000-03-26, 1w
    Has :aB, 2000-04-23, 5w

staltz commented 1 year ago

This is a very real case that should be supported I think. In my opinion alice should get the 5 messages from Bob and when she connects to another peer should try to get the previous 5 she is missing. Instead of deleting all 10 of the old messages she could just delete the oldest 5.

This was my best option in mind as well, basically the "Alice preserves both old and new ranges" case.

This led me to think that maybe we can get rid of previous field and just rely on the sequence number, since you're signing it anyway. But reading the Official SSB Paper and reading p2panda and Bamboo convinced me that backlinks (previous or lipmaa or both) are necessary for proving causal ordering. Otherwise, you could just author msgs with sequence number 3, 4, 5 on Monday, and then on Friday publish messages with sequence numbers 1, 2. The number in itself doesn't prove that one thing came after the other (even if you sign it), only backlinks do.

gpicron commented 1 year ago

I know I am repeating but with slice replication you will continue to discover such corner cases that make things each more complex and risky.

Keep it simple: replicate always all from last "anchor(s)" and for apps to emit anchors periodically and/or at some bytes of message threshold. Anchor are playing a role of lipma links and keeping in the db all the anchors is not a lot data, and you drop safely messages of old anchors. The model of replication is much simpler too.

staltz commented 1 year ago

Keep it simple: replicate always all from last "anchor(s)" and for apps to emit anchors periodically and/or at some bytes of message threshold. Anchor are playing a role of lipma links and keeping in the db all the anchors is not a lot data, and you drop safely messages of old anchors. The model of replication is much simpler too.

It is a compelling option, thanks for the reminder. Do you see any downsides? Typically there's always some kind of downside, I want to be aware of the downsides of every option.

I know I am repeating

Keep repeating! :) Sometimes I have difficulty understanding something until it is said in a different way.

staltz commented 1 year ago

If we would use periodic anchors, I would probably use sequence numbers because those are easier to deduce based on "have-range" / "want-range" which are also stating sequence numbers. We can approximate the bytes based on that. Most (typical) SSB messages are between 500 bytes and 2 kilobytes. So ~1kb per msg is a good rule of thumb.

staltz commented 1 year ago

Anchor are playing a role of lipma links and keeping in the db all the anchors is not a lot data, and you drop safely messages of old anchors. The model of replication is much simpler too.

Actually, if we keep all older anchors, that's not a small deal. For the case of 20k messages in a single feed, if we have an anchor every (e.g.) 100 messages, then that becomes a 200kB overhead. Compared to e.g. Bamboo's 10kB overhead for the same 20k messages.

ahdinosaur commented 1 year ago

I was reading things elsewhere, and well, lipmaalinks would help. :upside_down_face:

haha yessssss! :relaxed:

Actually, if we keep all older anchors, that's not a small deal

as mentioned in https://github.com/ssbc/ssb2-discussion-forum/issues/13#issuecomment-1457829796, if anchors are a good idea, then one possible approach is to combine both strategies: anchors and lipmaa link validation, so you always replicate from an anchor and only anchors have lipmaa links. as before, using lipmaa links means you only need to store logarithmically few messages to ensure log validation, and limiting this to only anchor messages reduces the price of using lipmaa links.

as for storage overheads, for the case of 20k messages in a single feed, if we have an anchor every (e.g.) 100 messages, if every message is 1kB, then there are 200 anchors and we only need to keep 6 anchors as lipmaa links, so 6kB overhead. not a big difference vs lipmaa-links-on-every-message. but might be less complexity overhead to compartmentalize the lipmaa links to only be involved in anchors: where replication is something like "fetch the latest anchors for the feed" (aka the lipmaa link certificate pool) and then "fetch 10 messages starting from this anchor". hmm, i'm not sure how anchors would work for your example though of wanting the latest 10 but the latest anchor is say 80 messages away.

gpicron commented 1 year ago

If we would use periodic anchors, I would probably use sequence numbers because those are easier to deduce based on "have-range" / "want-range" which are also stating sequence numbers. We can approximate the bytes based on that. Most (typical) SSB messages are between 500 bytes and 2 kilobytes. So ~1kb per msg is a good rule of thumb.

See https://github.com/ssbc/ssb2-discussion-forum/issues/16#issuecomment-1478967404

gpicron commented 1 year ago

Anchor are playing a role of lipma links and keeping in the db all the anchors is not a lot data, and you drop safely messages of old anchors. The model of replication is much simpler too.

Actually, if we keep all older anchors, that's not a small deal. For the case of 20k messages in a single feed, if we have an anchor every (e.g.) 100 messages, then that becomes a 200kB overhead. Compared to e.g. Bamboo's 10kB overhead for the same 20k messages.

Euh 20k / 100 = 200 anchors

1 anchor is:

1 link to previous anchor (32 bytes)
1 identity id (32 bytes) (but you store it only once)
1 previous link per tip message (32 bytes) (generally only one, there is more only if there was a fork in the same device which is not supposed to happen)
1 signature (32 bytes)
1 timestamp (8 bytes)
1 author pu key (32 bytes but you don't have to store it for all anchors)

So this only 104 bytes of useful data per anchor. Let's add a metadata and a bit of overhead per entry and count 120 bytes per entry.

This is more 25 kB that 100 kb for such a feed and such a feed is not the majority.

Moreover, the suggestion was not to emit messages every n messages but every n bytes of messages.

Let say an average user follow 200 users want to limit its storage to 500 MB , that means an average of 2500 KB per user.

25 or even 40 kb per user to keep the full anchors DAG is not meaningful.

staltz commented 1 year ago

120 bytes per entry

@gpicron I was using SSB classic feed format to make these estimates, where the "smallest" message size is about 500 bytes.

What encoding are you assuming in your calculation?

where replication is something like "fetch the latest anchors for the feed" (aka the limpaa link certificate pool) and then "fetch 10 messages starting from this anchor". hmm, i'm not sure how anchors would work for your example though of wanting the latest 10 but the latest anchor is say 80 messages away.

@ahdinosaur Yeah I appreciate your (and other folks) insistence on lipmaa links. I'm a stubborn person and sometimes don't see something unless people insist. :) I'm still weighing the idea of lipmaa links, because they do seem somewhat complex to implement (like that huge formula gives me headaches, it's not a straightforward calculation), and the simplicity of Geoffrey's suggestion is appealing.

To comment on what you said about replication, I would assume it would work like this in DAG sync:

You declare your "have-range" and this means only the latest "actual content" msgs you have, it does not try to declare the cert pool
They declare their "have-range" in the same way
Based on their have-range, you discover how many latest msgs you want from them, and declare that as your "want-range"
They send you all the msgs in your want-range, plus msgs in the cert pool that they know you don't have

Let's take that original example. Alice and Bob declare to each other their have-range.

gantt
    title Carol's feed
    dateFormat YYYY-MM-DD
    axisFormat %W

    section Alice
    Have-range :aHR, 2000-1, 10w

    section Bob
    certpool:b1, 2000-01-01, 1w
    certpool:b4, 2000-01-22, 1w
    certpool:b13, 2000-03-26, 1w
    Have-range:bHR, 2000-04-23, 5w

Alice declares her want-range:

gantt
    title Carol's feed
    dateFormat YYYY-MM-DD
    axisFormat %W

    section Alice
    Have-range :aHR, 2000-1, 10w
    Want-range:crit, done, bWR, 2000-04-23, 5w

    section Bob
    certpool:b1, 2000-01-01, 1w
    certpool:b4, 2000-01-22, 1w
    certpool:b13, 2000-03-26, 1w
    Have-range:bF, 2000-04-23, 5w

And Bob sends these in red:

gantt
    title Carol's feed
    dateFormat YYYY-MM-DD
    axisFormat %W

    section Alice
    Have-range :aHR, 2000-1, 10w
    Want-range:crit, done, bWR, 2000-04-23, 5w

    section Bob
    certpool:b1, 2000-01-01, 1w
    certpool:b4, 2000-01-22, 1w
    certpool:crit, b13, 2000-03-26, 1w
    Have-range:crit, bF, 2000-04-23, 5w

Because he knows what Alice wants, and he knows which certpool msgs she is missing, because he knows her have-range. The have-range implies you SHOULD have all the certpool messages with sequence number smaller than the have-range.

ahdinosaur commented 1 year ago

To comment on what you said about replication, I would assume it would work like this in DAG sync:

that all sounds right to me! :+1:

I'm still weighing the idea of lipmaa links, because they do seem somewhat complex to implement (like that huge formula gives me headaches, it's not a straightforward calculation)

yeah to be honest i have no idea how that formula works, i'd just copy pietgeursen/lipmaa-link (made for Piet's bamboo-rs) and pray for the best.

as far as i understand, in adding lipmaa links / certificate pools, the added complexity takes on different shapes depending on where you are in the DAG protocol:

when receiving new messages, you want to make sure you also receive the relevant cert pool messages.
- as shown above, with a DAG sync, this is really up to the sender, since the receiver doesn't know what cert pool messages to ask for until after receiving the new messages.
when sending messages, you want to make sure you also send the relevant cert pool messages.
- for this, in a DAG sync, each message you're sending will also include a reference to their cert pool message, so you just follow the cert pool references.
when verifying a message, you want to make sure you've already verified the cert pool message
- again, in a DAG, every message already includes a reference to their cert pool message, so follow the references as needed to verify.
when publishing a message, you need to make sure you include the correct cert pool message reference in your new message
- this is where the complicated formula comes into play, but only here.
- that formula is (sequence number) => lipmaa link sequence number, so for a DAG we might need something different.
- we need to answer the question, "i have a new message, which message should this refer to as the cert pool message?"
- because it's only needed for your own pre-published messages, we could afford to use more expensive tools like a "database index" (in-memory store) for this, if needed.
- looking at the lipmaa link photo, it seems to be "create a new lipmaa checkpoint, pointing to the previous lipmaa checkpoint (if any). for the next 3 messages, point to the previous message. repeat." (edit: wait there's a recursive 3 i'm missing in this.)
- again, since we only need to derive our own lipmaa link references, we could afford to keep track of all this for ourselves.
- however i'm not sure how this works with forks in your DAG.
ref: #13

gpicron commented 1 year ago

@gpicron I was using SSB classic feed format to make these estimates, where the "smallest" message size is about 500 bytes.

What encoding are you assuming in your calculation?

I don't assume any feed format. I assumed the data is stored in db not as message but as structured info + some overhead to rebuild the original message in the original feed format.

staltz commented 1 year ago

To put this into perspective, if you have a feed that is very large (for instance a very active user posting 2000 msgs per year, for 10 years) that's sequence number 20000. You can delete almost everything, as long as you keep approximately 10 old messages around. That's very little. To be precise, about 10kB. Small price to pay.

To correct this calculation I had made, I'm using the formula

$$2 \cdot \lceil \log_2(n) \rceil + 2 \cdot \lceil \log_2(m) \rceil$$

from https://aljoscha-meyer.de/linkingschemes Theorem 18. With $m = 1$ (the first msg) and $n = 20000$,

$$2 \cdot \lceil \log_2(20000) \rceil + 2 \cdot \lceil \log_2(1) \rceil$$

$$= 2 \cdot 15 + 2 \cdot 0$$

$$= 30$$

So that's approximately 20kB–30kB overhead to have lipmaa links. I'm using classic SSB feeds for this calculation.

Assuming many realistic feeds may have a few thousands of messages, like 1000–4000, then that's a certpool overhead of 20–24 messages, or 15kB–24kB.

Assuming we can fit 500 accounts within 50MB (this comes from another calculation I made), then that means about 12MB of total certpool overhead.

staltz commented 1 year ago

I don't assume any feed format. I assumed the data is stored in db not as message but as structured info + some overhead to rebuild the original message in the original feed format.

@gpicron Disk storage is not the only constraint, RAM is also one constraint. If you can fit more messages in disk, then that means you can fit more messages in RAM too, but that's not good because it may slow down queries in ssb-memdb by a scale proportional to your disk encoding efficiency.

staltz commented 1 year ago

I'm contemplating using a Bamboo-inspired lipmaa-link style feed format, and realized that the cert pool is going to stay around ~forever, and it's desirable for the message payload (actual content) to be deletable. Bamboo supports this by design, so does Buttwoo.

This is good news, because deleting the content in a message will:

Reduce disk usage
Reduce RAM usage once the message is loaded
Allow for user's to delete content without deleting metadata (and breaking verification)

And this affects some of the back-of-envelope calculations I made for the certpool overhead. Considering that content accounts for the biggest portion of a message, then

Assuming many realistic feeds may have a few thousands of messages, like 1000–4000, then that's a certpool overhead of 20–24 messages, or 15kB–24kB.

Assuming we can fit 500 accounts within 50MB (this comes from another calculation I made), then that means about 12MB of total certpool overhead.

... becomes 7kB–12kb overhead per account, and a total certpool overhead of 6MB. This is a lot better.

gpicron commented 1 year ago

Some simulations based on dump of the feeds of arj, cel and yourself which I expect are part of the oldest account and representative of large feeds.

node tests-js/ssb2-anchor.js
----------------------------------------
feed dump: tests-js/test-arj.json
feed Messages length: 9200
size for Array of JS objects : 10171730 bytes/ 9933 KB/ 10 MB
 average message size as JS Object: 1105.6228260869566  bytes
size for Array of BIPF : 6425120 bytes/ 6275 KB/ 6 MB
 average message size as BIPF: 698.3826086956522  bytes
ratio BIPF/JS: 63 %
size for BIPF Array : 6351524 bytes/ 6203 KB/ 6 MB
 average message size as BIPF Array: 690.3830434782609  bytes
ratio BIPF Array/JS: 62 %
----------------------------------------
feed dump: tests-js/test-cel.json
feed Messages length: 1266
size for Array of JS objects : 1148226 bytes/ 1121 KB/ 1 MB
 average message size as JS Object: 906.9715639810427  bytes
size for Array of BIPF : 760168 bytes/ 742 KB/ 1 MB
 average message size as BIPF: 600.4486571879937  bytes
ratio BIPF/JS: 66 %
size for BIPF Array : 750044 bytes/ 732 KB/ 1 MB
 average message size as BIPF Array: 592.4518167456556  bytes
ratio BIPF Array/JS: 65 %
----------------------------------------
feed dump: tests-js/test-andre.json
feed Messages length: 5560
size for Array of JS objects : 9908370 bytes/ 9676 KB/ 9 MB
 average message size as JS Object: 1782.0809352517986  bytes
size for Array of BIPF : 5754826 bytes/ 5620 KB/ 5 MB
 average message size as BIPF: 1035.0406474820145  bytes
ratio BIPF/JS: 58 %
size for BIPF Array : 5710350 bytes/ 5577 KB/ 5 MB
 average message size as BIPF Array: 1027.0413669064749  bytes
ratio BIPF Array/JS: 58 %
----------------------------------------
Inserting anchors with the following rules:
 - 1 anchor when size of messages since last anchor is > 500 KB in BIPF
 - 1 anchor when duration since last anchor is > 3 months
----------------------------------------
feed anchors for  arj
anchors : 28
size for Array of anchors : 4677 bytes/ 5 KB/ 0 MB
----------------------------------------
feed anchors for  cel
anchors : 26
size for Array of anchors : 4347 bytes/ 4 KB/ 0 MB
----------------------------------------
feed anchors for  andre
anchors : 10
size for Array of anchors : 1705 bytes/ 2 KB/ 0 MB
----------------------------------------

Worst case scenario: using the arj feed as a reference * 500 users
----------------------------------------
size of feed in BIPF (default) for 500 followed : 3175762000 bytes/ 3101330 KB/ 3029 MB
size of feed in JS for 500 followed : 5085865000 bytes/ 4966665 KB/ 4850 MB
size of anchors for 500 followed : 2342500 bytes/ 2288 KB/ 2 MB

If only keeping last 12 months of messages based on anchors:
 - number of messages since first anchor older than 12 months: 410
 - size of messages since first anchor: 360380 bytes/ 352 KB/ 0 MB

 projection for 500 followed users:
 - size of messages since first anchor in BIPF: 118293000 bytes/ 115521 KB/ 113 MB
 - size of messages since first anchor in JS: 180190000 bytes/ 175967 KB/ 172 MB

If only keeping last 24 months of messages based on anchors:
 - number of messages since first anchor older than 24 months: 1182
 - size of messages since first anchor: 1242128 bytes/ 1213 KB/ 1 MB

 projection for 500 followed users:
 - size of messages since first anchor in BIPF: 393342500 bytes/ 384124 KB/ 375 MB
 - size of messages since first anchor in JS: 621064000 bytes/ 606508 KB/ 592 MB

gpicron commented 1 year ago

the code of the simulation https://gist.github.com/gpicron/06061c784b9d59ecf532c674848ef272

staltz commented 1 year ago

I'm gonna close this issue because I think I got my answer with lipmaa links.

ssbc / ssb2-discussion-forum

Sliced replication issue #18