ssbc / ssb-replication-scheduler

Plugin to trigger replication of feeds identified as friendly in the social graph
8 stars 1 forks source link

Sympathy replication, an idea #12

Open staltz opened 2 years ago

staltz commented 2 years ago

I had an idea of implementing sympathetic replication like this:

partialReplication: {
  0: [
    { purpose: 'main' },
    { purpose: 'git-ssb' },
    { purpose: 'index' }
  ],

  1: [
    { purpose: 'main' },
    { purpose: 'git-ssb', $certainty: 50 },
  ],

Note $certainty: 50, this means that with 50% probability I will replicate a friend's git-ssb feed. This wouldn't be random, because we need every sbot session to always replicate the same "lucky" friend git-ssb feed, so determinism.

I thought that one way to achieve determinism is to pluck the first nibble of the friend's subfeed, and replicate that subfeed only if it belongs to my set of "chosen lucky nibble". With 50%, suppose that my deterministic lucky nibbles are: 0, 2, 4, 6, 8, a, c, e. If I had chosen 100% certainty, then the lucky nibbles would be all the 16 possible: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f. The lucky nibbles can be selected "randomly" too.

I don't know if this is too complicated, maybe there is a simpler and easier way of deterministically choosing whether a friend's subfeed is "lucky" given my certainty parameter, but my point is that sympathetic replication overall seems like it could be achieved with this simple $certainty value.

Thoughts @arj03 ?

staltz commented 2 years ago

Oh, maybe I should have mentioned that if $certainty is omitted, then it's assumed to be $certainty: 100 i.e. 100% replicate this feed.

And another important detail: we need determinism because if it's Math.random() then on every sbot session you would get different lucky feeds to replicate, which means that eventually over many sessions you'll end up replicating all feeds with ~99% certainty.

arj03 commented 2 years ago

This exact problem is a very interesting one in distributed systems :)

You are trying to define some availability metric. There are a lot of factors in this, on the top of my head: peer availability, feed availability, availability over which period, network topology (pubs, rooms, peers). There is also a signalling aspect, telling other peers what you are replicating. There is both this descriptive way (in your proposal this is only local though), but also sampling aspect could be used. One often used approach to testing this is to simulate it.

I think for a start it would be good with relatively simple rules. So something like replicate all nibbles for hops 1 and maybe just the ones you need for hops 2. Maybe this should be a setting so that a pub could choose to replicate those in full as well, this is could also be achived with your certainty parameter. Also it's worth noting that we have some good things working in our favor, like most people are interesting in e.g. main, invititations, so those nibbles would be replicated with a quite high probability. Similarly for the groups that you are part of, your friends would have those nibbles because they have the group feeds.

staltz commented 2 years ago

Also it's worth noting that we have some good things working in our favor, like most people are interesting in e.g. main, invititations, so those nibbles would be replicated with a quite high probability. Similarly for the groups that you are part of, your friends would have those nibbles because they have the group feeds.

Yes, but those are cases where the remote peer is definitely interested in replicating. I'm trying to account for those feeds that remote peers don't need, but they choose to replicate just to make my life easier, getting the message out.

arj03 commented 2 years ago

Another datapoint, one thing is that the nibbles without the actual feeds they are less useful. Maybe we could think of this as the problem of I want the sympathy replicate a feed (even if I can't decrypt it) and then model that.