sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
https://sirix.io
BSD 3-Clause "New" or "Revised" License
1.12k stars 252 forks source link

RedBlackTree => Van Emde Boas implicit pointer layout #228

Open JohannesLichtenberger opened 4 years ago

JohannesLichtenberger commented 4 years ago

We can implement a Van Emde Boas layout, which is based on implicit pointers for immutable trees (read-only transactions).

aryanVanaik commented 1 year ago

Hello, could you provide more information as I would like to work on this like perhaps where this needs to be implemented and is a conversion from a red-black tree to Van Emde Boas tree

JohannesLichtenberger commented 1 year ago

@aryanVanaik the idea is to use a more CPU cache-line friendly and thus faster index structure for reads. I think regarding writes a red black tree is ok, but rotations will of course add nodes, which have to be copied for a new version.

I think in general we have two possibilities

Also maybe it doesn't even make much difference for reads due to the current Java memory model. What do you think?

aryanVanaik commented 1 year ago

Interesting, at first glance reading from random locations is pretty bad ideally we would want to exploit locality for performance improvements but I would have to have a deeper look into an adaptive radix tree in terms of reading, and writing. Storing in persistent memory will be fine as long as everything is sequentially stored. But like you mentioned this depends on how the JMM works

JohannesLichtenberger commented 1 year ago

We optimize for flash drives (fast random reads and sequential writes), that is SirixDB only ever appends data to a file (two files actually) for each resource stored in a database. SirixDB not only copies data pages of an index trie, but instead stores fine granular modifications (but also has to read page fragments from different locations in parallel to reconstruct a page in-memory). The implementation is a mix of copy on write and a sliding window applied to the inserted data. So, the basis is a persistent tree of tries (in the functional sense), as in Datomic I guess, but it predates Datomic :-D

That's the first cache hierarchy (disk to main memory), but we could currently read the secondary indexes (based on red black trees) fully in memory (which are stored as fine granular nodes in the leaf data pages of keyed tries). Red black trees are however not very cache friendly in most implementations due to pointer chasing. Now we could read the stored data (the red black trees) into a better suited data structure as maybe an adaptive radix tree. For a better understanding of the simple tries (more or less an array organized in a tree):

https://sirix.io/docs/concepts.html

aryanVanaik commented 1 year ago

Ahh, I see this DB is making use of the properties of Flash memory if I am thinking of this correctly, I was thinking of starting to create a new class for the Adaptive radix tree and then replacing that with the current red-black structure we have. Let me know if I should start in another manner. Is there any documentation on how I can get the source code onto my system? Just a heads up as well this is a university assignment (contributing to an open source project) and I found this project very interesting :)

JohannesLichtenberger commented 1 year ago

First, we should extract interfaces. I think the rb tree currently is hard-coded in IndexController IIRC. I've copied am ART implementation some time ago, but never included it (in the index package).

You can simply git clone the repository and use the gradle wrapper. You have to use Java 20, ideally the Graal JVM

aryanVanaik commented 1 year ago

Hmm, I have just cloned it seems like the RBTree interface is not hard coded within the IndexController as there are separate classes for it

JohannesLichtenberger commented 1 year ago

Sorry, it's in the Index types CASIndex, PathIndex, NameIndex. The RBTreeReader and RBTreeWriter . Maybe we should then configure in ResourceConfiguration, which underlying index data structure to use. That way it would be fix over all (future) revisions.