sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
https://sirix.io
BSD 3-Clause "New" or "Revised" License
1.12k stars 252 forks source link

ObjectPool for pages and a SlicingArena #743

Open JohannesLichtenberger opened 2 weeks ago

JohannesLichtenberger commented 2 weeks ago

Page instances should be recycled, when the BufferManager evicts pages,as we currently always allocate new instances. Instead we should preallocate "empty" instances and clear them once evicted from the cache and put the instances back into one or two ObjectPools (for IndirectPages and KeyValueLeafPages). In the off-heap branch we should also allocate a big memory chunk upfront and use a slicing allocator to slice into smaller chunks for the KeyValueLeafPages.

XiangyuTan-learning commented 1 week ago

Hi @JohannesLichtenberger, I am brand new to make PR for open source project, do you mind if I take on this issue? as I saw there is a "good first issue" label on it. Thanks!

JohannesLichtenberger commented 1 week ago

@XiangyuTan-learning do you have software engineering experience? I think it may be ok for new developers on the project, but you might have to have some experience...

XiangyuTan-learning commented 1 week ago

@JohannesLichtenberger , I am a second year master student specialised in Software Engineering. The reason for bothering you is that one of my course assignment requiring to make PR to a open source object, so.... But I am only familiar with JAVA language only, can you please give me some advice about whether this issue good for me to choose or I have to choose others. Thanks

JohannesLichtenberger commented 1 week ago

You can try... the main issue is that we must reuse KeyValueLeafPages as we're allocating too much garbage. Thus, once the page is evicted from a RecordPageCache, it could potentially be reused for a new key-value leaf page, which is read from disk instead of always creating new objects (in PageKind, the KeyValueLeafPages are deserialized, and a new instance currently is created. We should thus instead use an object pool (for instance include StormPot -- in libraries.gradle for instance add stormpot : 'com.github.chrisvest:stormpot:3.2') and we have to include it in sirix-core.

Then, my main idea is to release the page to the ObjectPool once it's evicted from the RecordPageCache (it's not pinned anymore, and the TinyLFU algorithm decides to evict the page...). Thus, it can be returned to the pool, and once a new KeyValueLeafPage is needed, it should be fetched from the ObjectPool instead of creating a new instance. Furthermore, before releasing the page, the MemorySegment(s) should be cleared (fill with 0-bytes maybe) and all fields should be reset as if it were a new instance.

Another thing to note is that the KeyValueLeafPage in the branch which uses off-heap memory to store the page data in a slotted page has an Arena to create a MemorySegment for the slotted page (for the slots/the data). Instead of using an Arena per page and closing the arena we should use a global arena instead, then create all KeyValueLeafPages in the ObjectPool upfront with a SlicingAllocator (which uses chunks from a big MemorySegment).

Hope this makes some sense to you...

JohannesLichtenberger commented 1 week ago

@XiangyuTan-learning do you think you can work on this? It's the most pressing issue right now.

It's the update-slots-to-memorysegments branch you should work on...

XiangyuTan-learning commented 1 week ago

@JohannesLichtenberger , thanks, I would like to work on it. But my assignment only give me like two weeks to do this, and I am not sure if I can implement it. Besides, like you said, it's the most pressing issue for the project. So I am wondering can I just start working on it without assigning it to me, then if someone capable of implementing this shows up, you can assign it to them directly. I don't want to make negative effect to the project due to my inadequate capability. How do you feel about my opinion?

JohannesLichtenberger commented 6 days ago

I'll work on it myself, I guess ;)