ssbc / ssb2-discussion-forum

not quite tiny, also not quite large
17 stars 1 forks source link

ssb-db2 implementation notes #8

Open staltz opened 1 year ago

staltz commented 1 year ago

When it comes to implementing minibutt with ssb-db2, one of the obstacles is figuring out what are the latest messages. So far, for linear feeds, there has been only one latest message. For DAG feeds, there might be multiple latest messages.

From the perspective of ssb-db2, there is the base index that maps feedId => {offset, sequence} of the latest message. Also the "state" in db2 is a map of feedId => latest nativeMsg.

Base index will have to become feedId => [{offset, sequence}, ...].

State will have to become feedId => [latest nativeMsg, ...].

When appending a new minibutt message, the procedure will have to be something like:

  1. Let new be the new minibutt message, and latest (array) be the existing minibutt message(s) (from state)
  2. Pass new and latest to minibutt validation
  3. If new.previous is contained inside the latest set, then
    • subtracting new.previous from latest
    • AND insert new to latest
  4. Else, just insert new to the latest

Example

graph LR;
  0-->1-->2-. new .->3

latest = [2] new = 3 new.previous = 2

So rule (3) applies and we "subtract" 2 from latest, and insert 3, so the new latest becomes

latest = [3]

graph LR;
  0-->1-->2
  2-->3a[3]
  2-. new .->3b[3']

latest = [3] new = 3' new.previous = 2

So rule (4) applies and we just insert 3', so the new latest becomes

latest = [3, 3']

graph LR;
  0-->1-->2
  2-->3a[3]
  2-->3b[3']
  2-. new .->3c[3'']

latest = [3, 3'] new = 3'' new.previous = 2

So rule (4) applies and we just insert 3'', so the new latest becomes

latest = [3, 3', 3'']

graph LR;
  0-->1-->2
  2-->3a[3]
  2-->3b[3']
  2-->3c[3'']
  3a & 3b-. new .->4

latest = [3, 3', 3''] new = 4 new.previous = [3, 3']

So rule (3) applies and we "subtract" [3, 3'] from latest, and insert 4, so the new latest becomes

latest = [3'', 4]

staltz commented 1 year ago

The pseudocode I shared above may be general enough for linear feed formats too, so we don't have to implement an if/else in db2. After all, a linear feed is by definition a type of DAG feed.

arj03 commented 1 year ago

Base and thus state is used for feed validation and for EBT. We don't need EBT so that leaves feed validation. In your diagrams there can be an exception. Lets say your phone was offline for a day. So your computer has latest: 5, while phone has lets say 2. When the phone produces a new message, it's previous is going to be 2 and when you get that message on your computer, you won't have 2 in previous anymore. So maybe for minibutt think of state as just a cache, that most likely will have previous, but sometimes you might have to fetch the message from storage. Both in the case you illustrated and in the one I wrote.

staltz commented 1 year ago

it's previous is going to be 2 and when you get that message on your computer, you won't have 2 in previous anymore.

You mean that the desktop won't have 2 anymore because it was deleted due to sliced replication?

arj03 commented 1 year ago

Oh i meant in the base index. Just that we need to also sometime get from disk. Sliced is a bigger problem but this is mainly a problem if one machine is far behind.

staltz commented 1 year ago

Hmm, I have a difficulty seeing this, so let me put it on paper

Before

---
title: "Desktop: latest is 5"
---
graph LR;
  0-->1-->2
  2-->3a[3]
  2-->3b[3']
  3a & 3b-->4
  4-->5
---
title: "Mobile: latest is 2"
---
graph LR;
  0-->1-->2

Mobile publishes something

---
title: "Mobile: latest is 3"
---
graph LR;
  0-->1-->2-. new .->3[3'']

Desktop replicates from mobile

graph LR;
  0-->1-->2
  2-->3a[3]
  2-->3b[3']
  3a & 3b-->4
  4-->5
  2-. new .-3c[3'']

Applying that to the pseudocode:

  1. Let new be the new minibutt message, and latest (array) be the existing minibutt message(s) (from state)
  2. Pass new and latest to minibutt validation
  3. If new.previous is contained inside the latest set, then
    • subtracting new.previous from latest
    • AND insert new to latest
  4. Else, just insert new to the latest

Yeah I can see how step (2) is going to be a problem, because new = 3'' but latest = [5] and they are not connected. So indeed we need to fetch it from disk (if it has not been deleted!). So yes you're right about the cache.

But (3) is going to be skipped because new.previous = 2 is not in latest. Step (4) is going to run, and that's fine, it gives us latest = [3'', 5] and that's sounds correct.

About (2) failing due to msg 2 not being in disk anymore, we could use a rule of thumb to just ignore 3'' deeming it too old. What do you think?

staltz commented 1 year ago

About (2) failing due to msg 2 not being in disk anymore, we could use a rule of thumb to just ignore 3'' deeming it too old. What do you think?

Actually, I'm not sure about this. Maybe the new message is merging many "heads", some old heads and some new heads, so maybe the merge is valuable even if one of the heads is very old. And anyway with sliced replication, it is always the case where one or more "oldest" messages have missing previous from disk.

Updated db2 appending pseudocode

Base index to become feedId => [{offset, sequence}, ...].

State to become feedId => [latest nativeMsg, ...].

When appending a new minibutt message:

  1. Let new be the new minibutt message, and latest (array) be the existing minibutt message(s) (from state)
  2. Get new.previous messages from (preferably) latest if it exists, or from disk
  3. Pass new and new.previous to minibutt validation
  4. If new.previous is contained inside the latest set, then
    • subtracting new.previous from latest
    • AND insert new to latest
  5. Else, just insert new to the latest