ssbc / jitdb

A database on top of a log with automatic index generation and maintenance
50 stars 7 forks source link

seq rewinding in paginate() is incorrect #226

Closed staltz closed 2 years ago

staltz commented 2 years ago

Rewinding seq - holesFound is wrong, there's actually many ways how the rewinding is wrong.

E.g. 1: If seq is 3 but there were 1 deletes at seq 10, then you don't need to rewind, but we still do.

E.g. 2: if seq is 3 but there were 100 deletes at seq 200–300, then, we're going to rewind to seq -97, which is nonsensical.

But more fundamentally, the problem is that a paginate() that is run automatically post-compaction is using a seq argument referring to the previous uncompacted version of the log, and this seq argument has no commitment to the new realities of the compacted log. The seq system of the log before and after compaction is completely rewritten, because the log file itself is rewritten, so old values offset and seq are irrelevant.

staltz commented 2 years ago

@arj03 This is a hairy problem, and I'm thinking we can just bail out instead of solving it. We can fix toPullStream so that it listens to compaction, and as soon as compaction is done, we can end the pull stream (I'm not sure if it should end with error or true, but anyway, ends).

staltz commented 2 years ago

Okay, I found another solution, and it goes like this:

  1. After every paginate() is done, keep track of the msg.key for the latest msg in the page
  2. Compaction starts and the next paginate() call gets put on the queue, waiting for compaction to end
  3. When compaction ends, look up the msg.key for the latest msg, find the seq for that msg.key, and then resume paginate() with that seq+1 (we can safely assume that seq+1 is non-deleted because nothing is deleted after compaction)

On paper, it should work, but in practice there's quite a hassle to get msg.key => seq because that's something recorded in ssb-db2, not jitdb.

arj03 commented 2 years ago

I think it would be ok to just end the stream and let the other end continue if they want. Not sure how big of a hassle it is?

staltz commented 2 years ago

It is a hassle for live({ old: true }) streams because in that case we would end the old part of the stream, but the live part would continue as is. Consumers of the stream would receive a few old items, and then suddenly jump to receive only live items, and they would miss some important old items.