Open skyzh opened 2 years ago
For problem (1), would Weak
be helpful?
I think the cache layer is similar to the concept of buffer pool in tranditional database perspective. The technique to build a buffer pool can be used here, too: acquire, pin and replace. Acquire: One thread must ask buffer pool for the block, and buffer pool takes care of how to read it from disk. Pin: BufferPool always increase pins on the block before returning it to the thread, and the thread must unpin it after the read finishes. Replace: BufferPool can use clock or LRU strategy to find the proper block to evict if memory is insufficent. Any block that has pin count greater than 0 can not be evicted. If in read-only mode, the eviction is very simple, otherwise, dirty block must be persisted on disk before eviction. CMU database group has excellent online course on youtube about this topic.
Exactly! Block cache in LSM systems is the same as buffer pool. But it's hard to implement the buffer pool in Rust using the same way as cpp (like bustub). The two challenges are:
Exactly! Block cache in LSM systems is the same as buffer pool. But it's hard to implement the buffer pool in Rust using the same way as cpp (like bustub). The two challenges are:
* How to make an async buffer pool? pin and unpin might wait for the buffer slot to be available. Seems not trivial to implement a high-performance one in Rust. * How to represent the ownership of one slot in buffer pool? If we take a &[u8] from buffer pool, everything using the pool needs a lifetime in Rust.
To represent block/page, we can use Arc<RwLock<PageImpl>>
.
struct Page(Arc<RwLock<PageImpl>>);
struct PageImpl {
data: pu8; PAGE_SIZE as usize],
pin_count: u32,
dirty: bool,
}
To design an async buffer pool, my thought is to:
BlockID
to Vec<Sender<Page>>
. recv().await
it:
enum PageResp {
Ready(Page),
Wait(Receiver<Page>),
}
impl BufferPool { fn fetch_page(&self, page_id: PageID) -> PageResp { // first check if page is in memory, if yes, just return if let Some(page) = self.page_in_mem(page_id) { return PageResp::Ready(page) } // create a onetime channel for async operation. // caller can use rx.recv().await to wait for the IO operation done. let (mut tx, rx) = onetime_channel(); loop { match self.merge_sender_to_inflight(pageid, tx) { Ok() => { // if sender added into infight list, the background thread will send the result, // so just return rx return PageResp::Wait(rx) } Err(tx_fail_merge) => { // no inflight request for this page_id match self.insert_to_infight(page_id, vec![tx_failmerge]) { Ok() => { // start a background task to await disk manager IO // and send results to all callers runtime::spawn(async { let page = self.disk_manager.fetch_page().await; let senders = self.remove_inflight(page_id); for sender in senders { sender.send(page.clone()).await } }).detach(); return PageResp::Wait(rx) } Err(tx_fail_insert) => { // can not insert because anther thread just inserts the same // block concurrently, just retry tx = tx_fail_insert; } } } } } } }
I think PageResp
is actually anonying, and could be removed.
We can make fetch_page
method async, await inside it, and change the return type to Result<Page, Error>
, which is more convenient to use.
async fn fetch_page(&self, page_id: PageID) -> Result<Page, Error>;
- (2) Doesn't support fast-path concurrent read. When there are multiple queries accessing the same block, and the block is not present in memory, the
Column
struct will fetch the same block multiple times.
... is already solved with https://github.com/risinglightdb/risinglight/pull/556
So the current problem is to control memory usage of our system... Maybe we need to implement a real BufferPool now 🤣
Currently, we are using
moka
as our block cache. But using such library poses several problems:Arc
to a block.Column
struct will fetch the same block multiple times.Therefore, we should build our own block cache layer to avoid the problems.
Recently, I came up with https://boats.gitlab.io/blog/post/waitmap/, which seems to be able to solve (2). With some modification on the block struct, we might also be able to solve (1).