Open JohannesLichtenberger opened 4 years ago
Hello, could you provide more information as I would like to work on this like perhaps where this needs to be implemented and is a conversion from a red-black tree to Van Emde Boas tree
@aryanVanaik the idea is to use a more CPU cache-line friendly and thus faster index structure for reads. I think regarding writes a red black tree is ok, but rotations will of course add nodes, which have to be copied for a new version.
I think in general we have two possibilities
Also maybe it doesn't even make much difference for reads due to the current Java memory model. What do you think?
Interesting, at first glance reading from random locations is pretty bad ideally we would want to exploit locality for performance improvements but I would have to have a deeper look into an adaptive radix tree in terms of reading, and writing. Storing in persistent memory will be fine as long as everything is sequentially stored. But like you mentioned this depends on how the JMM works
We optimize for flash drives (fast random reads and sequential writes), that is SirixDB only ever appends data to a file (two files actually) for each resource stored in a database. SirixDB not only copies data pages of an index trie, but instead stores fine granular modifications (but also has to read page fragments from different locations in parallel to reconstruct a page in-memory). The implementation is a mix of copy on write and a sliding window applied to the inserted data. So, the basis is a persistent tree of tries (in the functional sense), as in Datomic I guess, but it predates Datomic :-D
That's the first cache hierarchy (disk to main memory), but we could currently read the secondary indexes (based on red black trees) fully in memory (which are stored as fine granular nodes in the leaf data pages of keyed tries). Red black trees are however not very cache friendly in most implementations due to pointer chasing. Now we could read the stored data (the red black trees) into a better suited data structure as maybe an adaptive radix tree. For a better understanding of the simple tries (more or less an array organized in a tree):
Ahh, I see this DB is making use of the properties of Flash memory if I am thinking of this correctly, I was thinking of starting to create a new class for the Adaptive radix tree and then replacing that with the current red-black structure we have. Let me know if I should start in another manner. Is there any documentation on how I can get the source code onto my system? Just a heads up as well this is a university assignment (contributing to an open source project) and I found this project very interesting :)
First, we should extract interfaces. I think the rb tree currently is hard-coded in IndexController
IIRC. I've copied am ART implementation some time ago, but never included it (in the index package).
You can simply git clone the repository and use the gradle wrapper. You have to use Java 20, ideally the Graal JVM
Hmm, I have just cloned it seems like the RBTree interface is not hard coded within the IndexController
as there are separate classes for it
Sorry, it's in the Index types CASIndex
, PathIndex
, NameIndex
. The RBTreeReader
and RBTreeWriter
. Maybe we should then configure in ResourceConfiguration
, which underlying index data structure to use. That way it would be fix over all (future) revisions.
We can implement a Van Emde Boas layout, which is based on implicit pointers for immutable trees (read-only transactions).