ssbc / async-append-only-log

A new append-only-log for SSB purposes
16 stars 6 forks source link

Compaction #48

Closed staltz closed 2 years ago

staltz commented 2 years ago

This is going to be a moving target, as I explore solutions for this.

Basic premise is: once there have been a couple of deletes (i.e. overwrites with zero bytes), we can produce a new log that has all the real records, minus the zero bytes.

The algorithm is an idea Dominic wrote here: https://github.com/ssb-ngi-pointer/ssb-db2/issues/306#issuecomment-1034479270


staltz commented 2 years ago

@arj03 Do you have any thoughts on how to persist the minimum uncompacted offset?

  1. Should it be in a header on the log? (the log so far has no header, just a bunch of blocks)
  2. Should it be in a separate file log.bipf.compaction? (temporary tiny file while the compaction is WIP)
  3. (Some other idea?)

Idea (2) is easier to do as a non-breaking change, but we probably need atomicity so we probably have to write both compaction metadata and new log state in one go, so maybe we have to do idea (1)?

staltz commented 2 years ago

IMG_20220308_161825_1_1.jpg

Algorithm so far. FUB means "first uncompacted block (index)", I thought it made sense to compact one block at a time, since we write an entire block at once. There is also another variable here (I don't know what's the best name yet:) "first record in the uncompacted part of the log", which may be in any further block, not necessarily in the FUB.

arj03 commented 2 years ago

I'll get back to you on this in about an hour

arj03 commented 2 years ago

I like the idea of a separate file best. This is similar to a "log" in a database. So you basically use that to keep track of what you are doing. You might have to redo a block an extra time (in the case of a crash), but that should be okay, since they are rather small. I also like the idea of "dirty blocks" that needs to be reindexed. I'm not entirely sure compared to just keeping track of the uncompacted offset. With these dirty block you only need to do something with the compaction blocks and maybe the tip of the log if you move a whole block back. It also might be a good idea to stop replication (writes) while you compact, to make it easier to reason about. The blocks are rather small, so my hunch is that this compaction should be quite fast (< 5 secs) for compaction after deleting a feed.

staltz commented 2 years ago

I like the idea of a separate file best.

What about atomicity? Say we save the log before we save "first record in the uncompacted part of the log" (which I decided to call firstUnshiftedRecord by the way) and then it crashes? Then when it loads up again, that record (suppose, "world" in the picture above) would be duplicated but AAOL wouldn't know that "world" has already been copied, and may end up copying it a 3rd time.

An important detail here is that AAOL doesn't have a notion of msg "keys" and allows any kind of buffers, so it's perfectly acceptable to AAOL to be foo, foo, ___, foo, foo (where ___ is a deleted record, and foo is <Buffer 66 6f 6f>). So we can't check from the log whether we have copied a record already or not.

I also like the idea of "dirty blocks" that needs to be reindexed.

Can you explain what you mean with dirty block? Sounds like you have an idea behind the word but it's not clear to me.

arj03 commented 2 years ago

What about atomicity?

The log needs to be written in such a way that you can always restart it. So maybe something like identity blocks be to fixed (this is what I mean with "dirty blocks"). Write those to the log maybe with the lowest sequence. Then start fixing the blocks, either one at a time or all of them. At this point the indexes should be threated as "needs to be reindexed from lowest sequence and forward". After the blocks are fixed either write to the log that they are fixed, or write a new atomic commit with the latest state that is that indexes needs to be rerun from lowest seq, but that no blocks needs to be fixed. And after all the indexes have run (or maybe have 1 sequence for each index) write that those have been fixed. This way no matter where you crash you always can restart. You might be doing some extra work, like reindex an extra time for redoing some blocks and extra time. That is okay, so long as everything you do is atomic. Because compaction can involve multiple blocks it needs to be the case that while you are in the "fix blocks" state (from the log), you always need to do that before you can do anything else including reindexing.

staltz commented 2 years ago

Oh, so you mean log.bipf.compaction should itself be a log of actions to take? That's interesting. Yeah and I can see how it would help for atomicity. It should probably be a WAL, so it's the first thing that is written to disk.

By the way, I'm not thinking yet about indexes, because they're in JITDB, but it's good to take into account what mechanism to use to signal to JITDB already now.

arj03 commented 2 years ago

Exactly. Might help to think of it in terms of a state machine with a log of actions as you write.