vacp2p / research

Thinking in code
MIT License
62 stars 4 forks source link

Support for Sequence Number #1

Closed decanus closed 3 years ago

decanus commented 4 years ago

Problem

If I receive a message after having missed a few, I am provided with a list of parents or previous_message messages that I can use to sync. This however requires the other party to be online and therefore getting up to sync can be a slow process.

Solution

We therefore add the field sequence that corresponds to the index a message is placed at within the remote log. Using the configurations like page_size a client can easily sync all messages at once from the last received sequence to the last synced sequence.

Acceptance criteria

Notes

oskarth commented 4 years ago

If I receive a message after having missed a few, I am provided with a list of parent or previous_messages messages that I can use to sync. This however requires the other party to be online and therefore getting up to sync can be a slow process.

Can we clarify if this a (1) single previous message (2) a list of parent messages seen from some other feeds (in some order?) (3) multiple previous message ids from that user (starting with most recent, a la linked list?).

Assuming 1, you'd have (a) the received message (b) one previous message ID. How does that relate to the sequence number? Assuming that would be inside the received message, i.e. something like a number 10, where you know you have synced up to message 5 before, your goal would be to get messages 6..9. Is this correct?

We therefore add the field sequence that corresponds to the index a message is placed at within the remote log.

What do you mean by sequence that corresponds to index of message? If you have the message id H1_10 of received message with seqid 10, you'd pull down the remote log, get the latest message ids, and then pull down the ones you don't have (the pairs field is ordered). How would the flow look different in terms of precise requests if this was complemented with either (a) map from seq id to pairs (b) include seq in the pairs triplet, i.e. <localHash, remoteHash, seqid>?

Using the configurations like page_size a client can easily sync all messages at once from the last received sequence to the last synced sequence.

Once you pull down the remote log, the len of pairs is the page size for that page. How would making this variable explicit change things? Even if you do know page size, how would you go from that to "syncing all messages at once" if you only know where last page is located? I.e.:

page 1: h10, h9, h8 | page 2: h7, h6, h5 | <link to page 3> page 3: h4...

In this example, how would you go from page 1 to page 3 if you know page size, considering pages aren't contiguously allocated? Unless the proposal is to keep track of all pages somewhere, which seems like a separate proposal

decanus commented 4 years ago

parents contains a list of ids for messages seen and sent in some order. previous_message contains the id of the last sent message.

Assuming that would be inside the received message, i.e. something like a number 10, where you know you have synced up to message 5 before, your goal would be to get messages 6..9. Is this correct?

This is correct.

What do you mean by sequence that corresponds to index of message?

The sequence should relate directly to the position a message holds in the remote log. So like its line number. Does this make sense?

In this example, how would you go from page 1 to page 3 if you know page size, considering pages aren't contiguously allocated? Unless the proposal is to keep track of all pages somewhere, which seems like a separate proposal

I see your point here, my assumption is wrong that you'd automatically know the page size.

oskarth commented 4 years ago

Is this definitely within remote log? Seems like it touches on MVDS/MDF too (since we removed it from there?)

decanus commented 4 years ago

@oskarth it definitely affects all protocols, its provides a close relationship between mvds and remotelog through mdf.