Reindex after or during log compaction

staltz commented 2 years ago

async-append-only-log now supports log.compact() https://github.com/ssb-ngi-pointer/async-append-only-log/issues/48 and we should reindex the relevant portions of jitdb indexes. See also https://github.com/ssb-ngi-pointer/ssb-db2/issues/306

[x] AAOL: compactionProgress() emits {done:true} on AAOL init
[x] AAOL: compactionProgress() emits {done:false} immediately when compact starts
[x] JITDB: if we start a query while log.compactionProgress() is undone, queue the query inputs
[x] JITDB: once log.compactionProgress() emits "done" release the queue
[ ] JITDB: reindex() should rebuild core indexes too
[ ] JITDB: indexingActive() obz API
[ ] JITDB: queriesActive() obz API
[ ] DB2: internal boolean levelIndexingActive
[ ] DB2: compact() API which checks jitdb.indexingActive() and jitdb.queriesActive() and levelindexingActive and postpones itself until they are inactive
[ ] DB2: once indexing is inactive, run log.compact() and once that's done, reindex jitdb and then run a new log.stream to reindex leveldb

staltz commented 2 years ago

@arj03 I'm working on this already, but I bumped into a design issue. My original plan is to rebuild all indexes while log compaction is still ongoing, based on what Dominic suggested here:

Instead of writing a separate second log, superimpose the compacted log on top of the old one, keeping track of the min uncompacted sequence number. When you do a query from the old indexes, drop results with offsets smaller than the uncompacted sequence, and mix with results from the new indexes. merge index responses by sequence numbers, so numbers smaller than the uncompacted sequence are from the compacted section and the numbers greater are from the uncompacted log.

For instance, my plan is to create a folder .ssb/db2/indexes/.compacting where the new index files exist, and once those are done catching up with the log, we delete all jitdb indexes in .ssb/db2/indexes and move up the files in .ssb/db2/indexes/.compacting to .ssb/db2/indexes.

However, does this go against the JIT design of JITDB?

Alternatively, should we just delete all JITDB indexes once log compaction is done, and let the JIT nature take care of rebuilding them?

arj03 commented 2 years ago

Will get back to you on this tomorrow :)

arj03 commented 2 years ago

For JITDB I don't think we shouldn't allow queries while compaction is running, instead keep track of the offset from which you started compacting and on done call reindex() with that. This way if you compacted only the end of the log it should be quite fast. For level indexes in db2 it might make sense to allow lookups for things we know are before compaction. I do believe that we should have a "high-level" compact function in db2 that takes care of doing these things. Because it also needs to keep track of state while compacting, so that we can continue if the app is shut down while compacting.

This could be a starting point for the calls to jitdb & level indexes, it's a bit different because here we are running on individual messages.

staltz commented 2 years ago

@arj03 Thanks!

reindex() currently doesn't rebuild the core indexes (seq, sequence, timestamp), is there a reason for that? I think with compaction we would also have to rebuild the core indexes.

Why do you say that level indexes should be allowed to lookup while jitdb shouldn't? Is it because of speed of rebuilding the indexes? (By the way, in my tests, I felt that some jitdb indexes do take a considerable time to build, like value_author.32prefix, which is 6 MB in a production database, and has to write 4 bytes for each record.)

arj03 commented 2 years ago

reindex() currently doesn't rebuild the core indexes (seq, sequence, timestamp), is there a reason for that? I think with compaction we would also have to rebuild the core indexes.

Right, that is because for encrypted they don't change. Here they do :)

Why do you say that level indexes should be allowed to lookup while jitdb shouldn't?

Just because it is easier. We don't need to flag to jitdb that says: please don't update indexes, just run on what you have.

I can maybe check the value_author prefix building later later to see if there is anything we can do to optimize that one.

staltz commented 2 years ago

Okay thanks, so I'm preparing for the plan to "not allow queries while compaction is running", and here are some disorganized thoughts (mostly questions to myself):

What happens when a jitdb query is ongoing and compaction starts? Do we cancel the query, or do we pause it and re-run it after compaction (and reindexing) is done? What if it was a 4th call to paginate and the starting seq would now mean that the 4th paginate would get results that were in the 3rd paginate (or, worse, would skip some results).

This is a bit similar to questions about log.get and log.stream: do we abort it or do we wait for compaction to end and rerun it?

staltz commented 2 years ago

I can maybe check the value_author prefix building later later to see if there is anything we can do to optimize that one.

Just as a reference, here are other (leveldb and jitdb) indexes of similar size (sorted). There are plenty of big jitdb indexes, computing them all and writing to disk isn't fast.

Index	Size	Type
search2	101 MB	level
ebt	21 MB	level
contacts	12 MB	level
timestamp	12 MB	bitvector
value_content_vote_link__map	6,2 MB	prefix map
seq	6,0 MB	bitvector
sequence	6,0 MB	bitvector
value_author	6,0 MB	prefix
fullMentions	5,1 MB	level
hashtags	5,0 MB	level
aboutSelf	3,0 MB	level
value_content_root__map	3,0 MB	prefix map
value_content_contact__map	2,6 MB	prefix map
base	1,3 MB	level
value_content_fork__map	218 KB	prefix map
meta_	173 KB	bitvector
meta_private_true	173 KB	bitvector
value_author_MYSSBID	173 KB	bitvector
value_contentroot	173 KB	bitvector
value_content_type_contact	173 KB	bitvector
value_content_type_post	173 KB	bitvector
value_content_type_pub	173 KB	bitvector
value_content_type_roomx2Falias	173 KB	bitvector
value_content_type_vote	173 KB	bitvector
canDecrypt	24 KB	bitvector
encrypted	64 B	bitvector

staltz commented 2 years ago

What happens when a jitdb query is ongoing and compaction starts?

Answering myself: if there are ongoing queries (of any kind), postpone compaction, such that compaction always starts when everything else in the database is idle, and then from that point onwards, queue all incoming queries so that they apply only after compaction is done.

arj03 commented 2 years ago

Yep agree. Also for most of these indexes you wouldn't need to do a full rebuild. Only from compaction start and onwards.

arj03 commented 2 years ago

I felt that some jitdb indexes do take a considerable time to build, like value_author.32prefix, which is 6 MB in a production database, and has to write 4 bytes for each record.)

I had to check this. Building author or type is around the same amount of time (40s). Note this is from totally empty indexes folder, so includes keys and base. Rebuilding author after type is 6s. Rebuilding both are 45s. So a tiny bit less. I tried disabling writing the jitdb indexes and it only took rebuilding from 45s to 43.2s. Decrypting is still very heavy as documented here at around 10s overhead. If I leave the canDecrypt file and level indexes the time to build both goes down to 18.7s.

staltz commented 2 years ago

Super weird bug with GitHub where I can't edit the original post TODO list, so I'm copying it down here:

[x] AAOL: compactionProgress() emits {done:true} on AAOL init
[x] AAOL: compactionProgress() emits {done:false} immediately when compact starts
[x] JITDB: if we start a query while log.compactionProgress() is undone, queue the query inputs
[x] JITDB: once log.compactionProgress() emits "done" release the queue
[x] JITDB: reindex() should rebuild core indexes too
[x] JITDB: indexingActive() obz API
[x] JITDB: queriesActive() obz API
[x] DB2: reindexOffset should pass pValue to processRecord
[x] DB2: internal levelIndexingActive() obz
[x] DB2: compact() API which checks jitdb.indexingActive() and jitdb.queriesActive() and levelindexingActive and postpones itself until they are inactive
[x] DB2: once indexing is inactive, run log.compact(), partially reindex jitdb, fully reindex leveldb, and partially reindex private
[x] JITDB: improve performance of "get msg from seq" https://github.com/ssb-ngi-pointer/jitdb/issues/215
[x] DB2: improve performance of getMsg necessary for delete https://github.com/ssb-ngi-pointer/ssb-db2/issues/334
[x] AAOL: add versioning to the compaction state file
[x] AAOL: what if compaction is done but crashes after the state file is deleted and before truncate starts?
[x] DB2: update Obz to 1.1.0
[x] AAOL: when restarting from a crash, compactionProgress needs to signal the original startOffset so we know where reindexing should start from
[x] DB2: consider what happens if crashes happen at any point, and store write-ahead state somehow
[x] JITDB: submit PR for indexingActive counter
[x] DB2: we need to signal leveldb indexes to be "reset" if they have stuff in memory, e.g. contacts, aboutSelf, etc
[x] DB2: submit compaction PR
[x] EBT: we need to update or delete some entries after we delete msgs from the log. To keep it simple, let's suppose we only use sbot.db.deleteFeed and thus we delete that whole feedId from EBT too
[ ] What about browser compatibility for AAOL/compaction.js and DB2 post-compact files?

staltz commented 2 years ago

@arj03 I think I hit a pretty sad issue: it seems like we have to reset all leveldb indexes after compact happens, and that's because most leveldb indexes hold state, and we don't know how to undo the state only for the deleted records.

Consider e.g. the about-self index which has key/value feedId => hydratedAboutObj. Suppose we deleted an about message with the name field, but we still have an about message with the image field. There's nothing in this index that tells us how to "partially" delete the hydratedAboutObj, so either we get the wrong reduced state, or we have to do a full rebuild.

Any ideas about this?

arj03 commented 2 years ago

@staltz right, reduce based indexes versus pure. I think we could introduce that abstraction and then you'd only have to do a full reindex on the reduce based ones. Most base indexes are not reduce.

staltz commented 2 years ago

Yeah, we could do that split. But it might still be hard to find and remove the outdated entries. Like the EBT index, I think it's [feedId, sequence] => offset, we need to look up the old ones via the right-hand side, i.e. the value, not the key.

arj03 commented 2 years ago

Oh right, you are correct. I guess there isn't any other good way than to reindex everything in this case :(

ssbc / jitdb

Reindex after or during log compaction #199