Introduce dual layer (disk / heap) S3 LRU cache for cache nodes

Summary

This PR introduces a configurable, dual-layer LRU cache for the S3IndexInput implementation. Also addresses a previously known issue where slicing the index input caused unnecessary memory consumption.

Data is now downloaded in 100MB (configurable) chunks to the disk, up to 200GB (configurable). This is stored on disk in an LRU fashion, which then feeds a much smaller 1GB (configurable) LRU cache cache on the heap, reading 2MB (configurable) chunks from the disk. All caches have been implemented using Caffeine, the successor to the Guava LoadingCache.

If long-term performance is satisfactory, we should consider moving the cache configs to the global config, as well as using the data directory config instead of the temp directory.

This was performance tested with the above cache configs, and the following pod & jvm settings:

instance-types:
  values:
    - m6id.32xlarge
    - m5d.24xlarge
request:
  cpu: 5
  memory: 20Gi
limit:
  memory: 20Gi

-XX:+UseZGC -XX:+ZGenerational XX:ActiveProcessorCount=20 -Djava.util.concurrent.ForkJoinPool.common.parallelism=20 -Xms15g -Xmx15g -Dastra.concurrent.query=30

More bench-marking should be performed to validate these configs on a variety of cluster configurations, as these are optimized for small clusters deployed on a heavily over-subscribed kube cluster.

slackhq / astra

Introduce dual layer (disk / heap) S3 LRU cache for cache nodes #1123

Summary