mikedilger / chorus

Chorus is a nostr relay
Other
40 stars 7 forks source link

Questions about the Event Store growth #22

Closed Nuhvi closed 2 months ago

Nuhvi commented 2 months ago

If events can be duplicated in the mmapped file, is there any strategy to deduplicate them on a recurring job? or is the file allowed to just grow indefinitely?

Does the relay make a lookup in LMDB first before storing the blob to avoid deduplication?

What happens to the data in the mmapped file, when all its indecies are deleted from LMDB? Do you plan to mark parts of the file as deleted and compress the file later on a recurring GC job?

Thanks, and congratulations for the inspiring work.

mikedilger commented 2 months ago

Events are not intended to be duplicated in the mmaped file. If that is happening, it is a serious bug. Deleted events are not cleaned out though. It would not be hard to create a tool that compacts the event map while rebuilding all the indexes.

Yes before storing, it checks the id map to make sure it is not there, then it checks the deleted map to make sure it is not there. Then it stores it, and writes indexes to those maps and commits. The commit is only on the lmdb side so technically it could error out after writing to the event map but before indexing, but that would only waste a bit of space, not cause a broken data store.

If the indexes were lost, we could rebuild them from the event map. The event map has a sequence of events, each one knowing how long it is, and the next one starts after the previous one. So I can run through them all and rebuild the LMDB indexes. I don't mark events deleted in the event map, but I have an LMDB index for that so stripping out deleted events would consult that index and we couldn't do it if LMDB was lost.

I just realized in typing this response that the events are not aligned in the mmap. I wrote the binary event structure to be fast when aligned to 8 bytes, but I forgot to align them when writing to the mmap. The next event should be after the previous event, but then bumped up to the nearest 8-byte alignment.

mikedilger commented 2 months ago

I spent today fixing the alignment, and writing chorus_compress which rewrites the data without deleted events (and fixes alignment too).