juicedata / juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.
https://juicefs.com
Apache License 2.0
10.59k stars 925 forks source link

Use any faster object store as cache tier #1363

Open davies opened 2 years ago

davies commented 2 years ago

When the underlying object store is not fast enough, and the local disk is not big enough, we can setup a faster object store between them as the cache tier (read cache and write cache).

Some candidates: Redis or memcached compatible KV store.

suzaku commented 2 years ago

@davies Is any one working on this issue now?

suzaku commented 2 years ago

It seems to me the simplest way to add such a cache is to create a new ObjectStorage with a field for cache and another for the actual storage. When options like cache-bucket are set in command line along with bucket options, we can create such a new ObjectStorage. But I'm not sure when should the cache be written. Since this issue mentioned that the cache should be a faster object store, after reading from the actual storage with Get(key string, off, limit int64), how can we cache the result in the cache store? Or maybe we just need a KV store API just like any other cache out there.

davies commented 2 years ago

@suzaku There is PR as the proof of concept: https://github.com/juicedata/juicefs/pull/1364/files

suzaku commented 2 years ago

Thanks.

suzaku commented 2 years ago

The POC PR answers my first question, only Get(key, 0, -1) calls are cached. Will cache utilization be better if you add this remote cache at the CacheManager layer instead of the ObjectStorage layer?

davies commented 2 years ago

The CacheManager will not cache the partial result from object store, so they are the same.

suzaku commented 2 years ago

Make sense, thanks.