ssbc / margaret

a flume-like persisted append-only log implementation
MIT License
16 stars 3 forks source link

Are indexes completely separate from the main database? #28

Closed KyleMaas closed 1 year ago

KyleMaas commented 1 year ago

As far as I can tell from reading the code, it looks like the main database is completely disconnected from the indexes. So, for example, you could only run Margaret indexes without any data in the main database, or you could only run a Margaret database without any indexes. I don't see anything that indexes at the same time as data is appended to the database. Am I understanding this correctly?

cryptix commented 1 year ago

Indexes usually reference the entry in the (offset)log, so running them without a backing log wouldn't really work as it stands.

KyleMaas commented 1 year ago

It wouldn't make sense as it was originally designed, but it would work, correct? For example, usage like in this test, which doesn't appear to reference an offset log at all:

https://github.com/ssbc/margaret/blob/master/indexes/test/setidx.go

If you add a record to the main log, does it do anything to trigger indexing of it, or does that all have to be managed externally to Margaret?

If you delete a record from the main log, does it delete the associated record from the index?

I see there are ways to do that, but I guess what I'm getting at here is whether there's any kind of intrinsic connection between the main log and the indexes or if all of that is handled by whatever is using Margaret. If it's all done externally, then I can think of several data corruption scenarios in go-ssb which are currently unhandled.

KyleMaas commented 1 year ago

Also, to clarify: the indexes (from an external point of view) are pretty much just pure-and-simple key/value stores, correct?

cryptix commented 1 year ago

I can't fully answer your questions @KyleMaas. I wish I could but a lot of the concepts were @keks' idea who at some point lost interest. I just kept pushing forward. You might be right about these things.

And yes, the idea was to always update the log and the indexes as one operation but this surly is easy to get wrong.

keks commented 1 year ago

Yeah IIRC the indexes are key value stores with observables tacked on.

These were used in the JS implementation and it seemed to me that pattern made sense over there. Not sure if that was a good idea (this got a bit messy with static typing and uses interface{} all over the place) or whether the implementation is any good. I imagine having channel writes in between everything isn't great for performance. I probably wouldn't do it like this again.

KyleMaas commented 1 year ago

Cool. Thanks! That helps clarify things.