pkoutoupis / rapiddisk

An Advanced Linux RAM Drive and Caching kernel modules. Dynamically allocate RAM as block devices. Use them as stand alone drives or even map them as caching nodes to slower local disk drives. Access those volumes locally or export them across an NVMe Target network. Manage it all from a web API.
http://www.rapiddisk.org
GNU General Public License v2.0
298 stars 49 forks source link

Avoid write #181

Closed tsukasagenesis closed 1 year ago

tsukasagenesis commented 1 year ago

In the case of short lived write data (that may be deleted soon after, and dont need to write it to fs), is their a mode where we can keep in RAM AS MUCH data possible and only push on underlying FS the slow files ?

I'm not sure if rapiddisk can do it.

pkoutoupis commented 1 year ago

@tsukasagenesis Hello. And thank you for your interest in rapiddisk. Currently, such a mode or feature does not exist. Once upon a time and nearly a decade ago, I did experiment with a similar implementation for temporary virtual machines. But I have since deleted that branch and decided not to pursue it. The biggest concern for such an implementation is for the instance when a volume mapping runs out of memory in the cache (RAM) volume. This could potentially lead to data corruption of the underlying device.

pkoutoupis commented 1 year ago

I have given this more thought since yesterday's response and am convinced that there is no clean path forward with such a solution. When the cache volume fills up, problems will occur and even if I send a simple EIO or ENOMEM message to the calling process, it will further complicate things.

It would be nice but only if the cache volume can dynamically grow (with a background monitoring process) while mapped as a cache and unfortunately, RapidDisk-Cache does not support that. Such a function could also result in data corruption.

tsukasagenesis commented 1 year ago

Its not possible to slow down/block the buffer until it's flush on the underlying FS ? In my case I have 10x2TB SSD and flush is fast, I just try to avoid to write to it for short lived data to save the disk durability (with 400GB of ram).

pkoutoupis commented 1 year ago

Sure it may be possible to slow down the flush to the underlying device and do it less frequently (with a modification in the code) but I thought the original question was to avoid flushing overall and keep the updated data only in the cache volume (not changing anything underneath. Did I misunderstand?

tsukasagenesis commented 1 year ago

The idea is fast file stay in ram, and slow file are copy to the underlying FS (as soon ram going to be overloaded), the idea is whatever slow file will take space in ram long time and will have to be written anyway. But many fast written could avoid the writing by being deleted soon after.

pkoutoupis commented 1 year ago

@tsukasagenesis I want to be clear: are you speaking of a read cache or writeback cache? If the latter, the RapidDisk suite leverages the dm-writecache module to do the heavy lifting on all writeback caching. With that in mind, dm-writeback does something which I truly appreciate and that is it looks at the incoming workload. If it identifies sequential or large transfers, it flushes it to the backing store. If it instead observes small and random writes, it will cache it in the volume designated as the cache. The driver seems to be customizable with regards to when it flushes data to disk but I have yet to implement that feature or function in the userspace utilities. This feature request is tracked by https://github.com/pkoutoupis/rapiddisk/issues/62 and maybe it is about time I revisit it.

tsukasagenesis commented 1 year ago

It's writeback cache i'm looking for ! I will look into dm-writecache but wasn't sure if I can do what I want !

So it's indeed interesting but since my underlying is SSD I dont mind too much to have random access, but it's indeed a really nice feature. What i'm looking for is to delay fast transfers and write back slow one, the purpose is for a server who download file and send it back somewhere else and then delete it, avoid a write on the underlying FS of SSD with a lot of ram if it append fast (10 min for example)

I will look more deeply in dm-writecache to see how it can feet my purpose, thanks you a lot for your help.