moka-rs / moka

A high performance concurrent caching library for Rust
Apache License 2.0
1.64k stars 73 forks source link

Provide a way to restore a `Cache` from entries with metadata and a `FrequencySketch` snapshot #314

Open tatsuya6502 opened 1 year ago

tatsuya6502 commented 1 year ago

CC: @peter-scholtens

When #312 and #313 are implemented, we will have enough data to restore a cache to a previous state. Provide a way to restore cache from them.

Inputs:

Output:

Example

use std::time::{Duration, Instant};
use time::OffsetDateTime;
use moka::sync::Cache;

// Saved entry in my app.
struct MyEntry<K, V> {
    key: K,
    value: V,
    last_modified: OffsetDateTime,
    last_accessed: OffsetDateTime,
}

// Define a closure to convert time::OffsetDateTime to
// std::time::Instant.
let (now_instant, now_dt) = (Instant::now(), OffsetDateTime::now_utc());
let dt_to_instant = |datetime: OffsetDateTime| -> Instant {
    let duration = now_dt - datetime;
    now_instant - Duration::from_secs(duration.whole_seconds())
};

// Recreate a FrequencySketch snapshot and the BuildHasher.
// See https://github.com/moka-rs/moka/issues/313
let frequency_sketch = ...;
let build_hasher = ...;

// Create a moka CacheLoader.
let cache_loader = Cache::builder()
    .max_capacity(MAX_CAPACITY)
    .time_to_live(TTL)
    .time_to_idle(TTI)
    // This will call the validate method of frequency_sketch.
    // If passed, it will return an `Ok(CacheLoader)`.
    .loader_with_frequency_sketch(frequency_sketch, build_hasher)
    .unwrap();

// Get the saved entries (Vec<MyEntry<K, V>>) from somewhare
// (e.g. filesystem, or database)
let entries = get_saved_entries();

// Load the saved entries to the Cache.
for my_entry in entries {
    cache_loader.insert(
        my_entry.key,
        my_entry.value,
        None, // policy_weight,
        Some(dt_to_instant(my_entry.last_modified))),
        dt_to_instant(my_entry.last_accessed),
        None // expiration_time
    );
}

// Get the Cache.
let cache = cache_loader.finish();

How it will work

  1. CacheBuilder has following methods that returns a CacheLoader:
    • methods:
      • loader(self)
      • loader_with_hasher(self, BuildHasher)
      • loader_with_frequency_sketch(self, FrequencySketch, BuildHasher)
        • This will validate the FrequencySketch with the BuildHasher.
    • The CacheLoader has a Cache but it is yet private.
  2. When insert is called on the CacheLoader, it will do the followings:
    • Insert the given entry into the internal concurrent hash table of the Cache.
    • Create an EntryInfo from the given metadata.
    • Add a (Arc<K>, last_accessed) to a Vec.
    • If TTL is provided, add a (Arc<K>, last_modified) to another Vec.
    • If expiry is provided, add the entry to the hierarchical timer wheels.
  3. When finish is called, it will do the followings:
    • Sort the Vecs by last_accessed and last_modified respectively.
    • Create the access order queue from the sorted Vec by last_accessed.
    • If TTL is provided, create the write order queue from the sorted Vec by last_modified.
    • Now the cache state has been restored. Invoke run_pending_tasks (moka v0.12.x) several times to evict expired entries, and if the max capacity is exceeded, evict idle entries.
      • If the eviction listener is set, it will be notified for evictions.
    • Finally, return the Cache.
tatsuya6502 commented 11 months ago
cache_loader.insert(
        my_entry.key,
        my_entry.value,
        None, // policy_weight,
        Some(dt_to_duration(my_entry.last_modified))),
        dt_to_duration(my_entry.last_accessed),
        None // expiration_time
    );

Rather than making CacheLoader::insert method to directly take every metadata values like policy_weight, make it to take an EntryMetadata. Also, provide an easy way to build an EntryMetadata, e.g. using a builder, parsing JSON, or deserialize from a binary (Vec<u8>).

use moka::EntryMetadata;

// Builder
let metadata = EntryMetadata::builder()
    // This takes std::time::Instant. Maybe provide an alternative method to take
    // `time::OffsetDateTime` for convenience?
    .last_modified(dt_to_instant(my_entry.last_modified))
    .last_accessed(dt_to_instant(my_entry.last_accessed))
    .build();

// Parse JSON. (Will require `serde`, `serde_json` and `time` crates)
let metadata = EntryMetadata::from_json(serde_json::json! {
    "lastModified", "2023-12-23T09:55:06.000+08:00",
    ...
});
tatsuya6502 commented 11 months ago
  1. When finish is called, it will do the followings:
    • ...
    • Now the cache state has been restored. Invoke run_pending_tasks (moka v0.12.x) several times to evict expired entries, and if the max capacity is exceeded, evict idle entries.
      • If the eviction listener is set, it will be notified for evictions.
    • Finally, return the Cache.

I will remove the step to invoke run_pending_tasks. It does not seem right to call the eviction listener before giving the Cache to the user code.