sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
https://sirix.io
BSD 3-Clause "New" or "Revised" License
1.12k stars 251 forks source link

Store records in off-heap memory #528

Open JohannesLichtenberger opened 2 years ago

JohannesLichtenberger commented 2 years ago

In order to keep GC to a minimum (despite of low latency Garbage Collectors as for instance Shenandoah), we could try to store the records/nodes off-heap using the foreign memory API for instance, and compare performance.

JohannesLichtenberger commented 1 year ago

Probably we should simply store the trx intent log in a Chronicle map instead of a simple HashMap.

JohannesLichtenberger commented 1 year ago

https://minborgsjavapot.blogspot.com/2019/07/java-chroniclemap-part-1-go-off-heap.html

Should'nt be hard to implement.

abhinax4991 commented 1 year ago

Hey @JohannesLichtenberger Guten tag , would like to work on this can you please assign this to me.

JohannesLichtenberger commented 1 year ago

@abhin-dynamify I wonder if serialization of page (fragments) and deserialization will be an issue and if it's faster or even slower. Same for Caffeine caches (the lightweight buffer manager)... that said, with the Caffeine caches, if the maximum sizes are too large, I've had severe performance issues regarding the GC (especially with the ZGC, apparently maybe because it is not generational yet).

JohannesLichtenberger commented 1 year ago

We can try and compare performance in a separate branch :+1:

JohannesLichtenberger commented 1 year ago

@abhin-dynamify hope it makes sense. Do you work on this?

abhinax4991 commented 1 year ago

@abhin-dynamify hope it makes sense. Do you work on this?

yeah i am working on it

JohannesLichtenberger commented 1 year ago

@abhin-dynamify did you have time?

Sung-Heon commented 2 months ago

@JohannesLichtenberger Can I try this?

JohannesLichtenberger commented 1 month ago

In the KeyValueLeafPage, we store the slots as a byte array of byte arrays. Maybe we could instead use a single MemorySegment which may have to grow.

JohannesLichtenberger commented 1 month ago

@Sung-Heon still interested?

Sung-Heon commented 1 month ago

Yes~!

JohannesLichtenberger commented 1 month ago

We currently have a way too high allocation rate, I think (2,7Gb/s with a single read-only trx). Can you try to replace the slots byte[][] array in KeyValueLeafPage with a single MemorySegment?

JohannesLichtenberger commented 1 month ago

We probably need an indirection array at the start of the page though with offset/lengths. Furthermore, the pages are variable sized, so the MemorySegment might have to be reassigned with a bigger one, copying all data. Also, for instance if variable sized data as Strings are reassigned and bigger as before... getting a bit tricky.

JohannesLichtenberger commented 1 month ago

You can probably join the Discord channel.

JohannesLichtenberger commented 1 month ago

@Sung-Heon do you have experience with this or database architecture stuff in general? It's of utmost importance to change this to reduce allocation pressure, I'd otherwise assign it to myself.

The records perhaps should be backed by MemorySegments, too. Once a new record/node is created it should be serialized to the backing MemorySegment, which in turn should be set as the slot data as part of the large page MemorySegment. We can probably read from the MemorySegment directly, too, at least most stuff...