wfraser / backfs

BackFS is a FUSE filesystem designed to provide a large local disk cache for a remote network filesystem.
http://www.codewise.org/backfs
GNU General Public License v2.0
16 stars 7 forks source link

Request: Cache whole file #10

Open christianreiss opened 7 years ago

christianreiss commented 7 years ago

Hey!

Great job! :) In the times of CloudDrives this will be very handy-- and is. Would it be possible to add a switch to make backfs cache entire files (regardless of size) uppon first read?

Some people will have this use case; if not already. Cheers, -Chris.

wfraser commented 7 years ago

Thanks!

Unfortunately, adding such functionality would be hard to do correctly. Currently, each read blocks the caller while it fetches the block. Because the block size is typically quite small, this delay isn't very noticeable. But if we block the first read on fetching the entire file, which might be very large, then the stall would be very noticeable.

Of course, the proper fix would be to fetch the data asynchronously and complete the first read once the first block has been fetched, but this is complex to implement.

I'm curious what your use case is. When I've needed to ensure entire files are loaded into the cache, I just do something like cat file > /dev/null beforehand. Would that not be sufficient for your use case?

christianreiss commented 7 years ago

Hey,

thanks for replying- am just about to head out, so I must be brief. I am using google Drive and Amazon CloudDrive which I am mounting into my home Fileserver. I then use backfs to accelerate the caches which I am then sharing. So there is the use case of "remote" ISO files that need to be pulled over. This I could work around by simply copying the file to my local PC.

I do have a set top Box that does record TV locally and during the night it pushes all records into the cloud. The STB has the cloud mounted via said Server by employing cifs. This would greatly benefit If a read ahead/ cache whole file would be implemented. A read-ahead would need to be in effect uppon the first access of the file. A read ahread would be prefferable as a cache-whole file would really be bad: You would open/stream a video file, backfs would start caching the whole (10gb) file. after 2-3 Minutes you'll notice you have already seeen this show, quit, open the next and another cache would start...

The current state of backfs would not help at all with this setup:

Thanks for reading and your consideration! -Chris.

wfraser commented 7 years ago

As a quick workaround, you could also experiment with increasing the block size. The default is 128 KiB (0x20000 bytes); you could bump this up to several megabytes and effectively get very large readahead, though it may have the problems with stalling periodically as I mentioned, depending on how userspace does its reads.

You can specify -o block_size=$((10*1024*1024)) when mounting backfs to get a 10 MiB block size, for example.

(note that you will have to delete your cache every time you change the block size)

christianreiss commented 7 years ago

Hey,

Will try that when I get home-- would this affect first-access as well or 'only' subsequent access to the files?

Cheers, -Chris

wfraser commented 7 years ago

It'll mean the first access of each block will take much longer.

Say you set the block size to 10M. The kernel still issues individual read calls with pretty small buffers -- usually on the order of a few kbytes. Imagine a program reading a file sequentially from the start to the end. It'll go like this:

  1. first read, 4K at offset 0. This causes the first 10M of the file to be fetched, and the call blocks for a while until this is done.
  2. subsequent reads of 4K from offsets 4K to (10M - 4K): these are cache hits and complete basically immediately.
  3. next read of 4K at offset 10M. This causes the next 10M to be fetched, and blocks for a while. ... etc.

Of course this is just the first time you read that file. Afterwards, it'll be in the cache, and all these calls will be cache hits and it'll be read very quickly.