risinglightdb / risinglight

An educational OLAP database system.
Apache License 2.0
1.6k stars 213 forks source link

storage: abstract disk I/O #660

Open skyzh opened 2 years ago

skyzh commented 2 years ago

Currently, we have create_dir, etc. everywhere. We'd better have a single interface to operate files on disk, so as to support in-memory / disk / object store backends.

wangrunji0408 commented 2 years ago

Isn't that only the disk backend depends on file operations?

skyzh commented 2 years ago

Yes. But even disk backend has two modes -- pure in-memory mode (for testing, where files are stored in a hash map), and real on-disk mode.

skyzh commented 2 years ago

If we can abstract all disk operations to use a trait like ObjectStore, we can prevent unwanted writes to disk in in-memory secondary storage.

wangrunji0408 commented 2 years ago

I think we should use the real on-disk mode in testing, to make sure we are correctly using the system fs API. Data path can be redirected to ramfs/tmpfs to speed up.

wangrunji0408 commented 2 years ago

Later we can also introduce the simulation testing, where all fs API will be mocked to an in-memory simulator.

skyzh commented 2 years ago

I switch to pure in-memory mode because tmpfs is slow 🤣 fsync and manifest write will add 1s latency to every disk test case, which looks weird.

Maybe I can try switch off fsync for all writes (even manifest), and maybe things will work.

wangrunji0408 commented 2 years ago

🤣 okay let's stay in memory mode for efficiency.