[Searchable Snapshot] Revist using other open source LRUCache implementation for File System caching

aabukhalil commented 1 year ago

Coming from #4964 and based on concerns raised from https://github.com/opensearch-project/OpenSearch/pull/5641#pullrequestreview-1285817231

More context is in https://github.com/opensearch-project/OpenSearch/issues/4964#issuecomment-1421689586, https://github.com/opensearch-project/OpenSearch/issues/4964#issuecomment-1421722470 and https://github.com/opensearch-project/OpenSearch/pull/5641#discussion_r1099518731

So far we have evaluated using Guava/Caffeine (https://github.com/opensearch-project/OpenSearch/issues/4964#issuecomment-1421722470) cache but they don't support custom eviction logic (prevent evicting entries when they have active usage) and they are not open for extension. That's why https://github.com/opensearch-project/OpenSearch/pull/5641 has modified version of Guava/Caffeine cache. A new suggestion on how to use Caffeine cache came here so we need to re-evaluate using Guava/Caffeine again.

Apache JCS was suggested in https://github.com/opensearch-project/OpenSearch/pull/5641 and it look promising because it is more open for extension rather than modification but still I'm not able to find a straight forward path/ perfectly fitting implementation for our cache usage.

aabukhalil commented 1 year ago

When evaluating which cache implementation to use or if to stick with in house implementation, we need to think about near future use cases (within 1 year). I can think of:

Will we give used ability to pin/prefetch IndexInput and/or make IndexInput always available locally ? if yes then cache should support enabling/disabling specific entries by calling an API of the cache
Will we support more advanced smart eviction logic ?

aabukhalil commented 1 year ago

this is exactly how we can utilize Caffeine cache https://github.com/opensearch-project/OpenSearch/pull/5641#discussion_r1102032394

andrross commented 1 year ago

I put together a quick commit on my fork that shows how we can replace the hand-rolled LinkedDeque with a LinkedHashMap.

reta commented 1 year ago

I put together a quick commit on my fork that shows how we can replace the hand-rolled LinkedDeque with a LinkedHashMap.

Have you run the benchmarks (https://github.com/opensearch-project/OpenSearch/pull/6610) against LinkedHashMap alternative? Just curious what would be the diff

andrross commented 1 year ago

Have you run the benchmarks (#6610) against LinkedHashMap alternative? Just curious what would be the diff

@reta I put a PR together here with the benchmark results: #6968

andrross commented 1 year ago

Just want to provide some updates and more context here:

The overall functionality provided by this cache feature is that remote index data is pulled down from the object store in chunks (8MiB by default) and written to local disk, and searches are served using the chunks from the local disk. The total size of the disk available for use as the "cache" is fixed and defined by a node settings. When that total size is exhausted, unused chunks are deleted from the disk to make room for new chunks.

This is implemented (after #6968) by using two Maps:

data: A plain HashMap that holds pointers to all chunks on disk that are actively being used. These cannot be deleted as ongoing searches hold references to them. If the total size of these chunks is larger than the disk cache size, then the amount of disk space being used will exceed the cache size setting. The intent is that the local disk cache is likely to be much larger than the size of the chunks that can be actively used by concurrent searches at any given time.
lru: A LinkedHashMap that holds pointers to chunks on disk that are not actively being used and therefore eligible for eviction. When a request adds a new chunk to be used, then if we are exceeding the disk cache size lru will be iterated over and entries will be removed (and therefore deleted from disk) until we are under the limit.

All operations on the cache are guarded by a simple ReentrantLock that prevents concurrent access. To provide additional concurrency, a segmented approach is taken where Runtime.getRuntime().availableProcessors() number of caches are created, and each cache is set to use disk cache size / Runtime.getRuntime().availableProcessors() as its capacity.

In my opinion, the segmented approach adds complexity and is still vulnerable to hot keys. I think the main improvement we can make here is to get rid of the segmented cache and go to a single implementation with finer-grained locking. Best-effort semantics on the disk usage and eviction are likely okay, as over- or under-using the cache by a small percentage is probably acceptable. The current implementation is best-effort anyway as the size is enforced at each segment, and not the overall sum of the segments.

reta commented 1 year ago

In my opinion, the segmented approach adds complexity and is still vulnerable to hot keys. I think the main improvement we can make here is to get rid of the segmented cache and go to a single implementation with finer-grained locking.

The complexity is certainly here and the hot keys could significantly reduce its effectiveness, +1 to look for simpler and more efficient way to (re)implement that.

opensearch-project / OpenSearch

[Searchable Snapshot] Revist using other open source LRUCache implementation for File System caching #6225