Memory utilization grows significantly beyond max_capacity() under load

chriskyndrid commented 2 years ago

Thank you for the excellent work on this interesting project. I've implemented Moka in a project I'm working on to utilize as a memory based cache. I'm passing a struct to bincode, encoding it, and storing it the cache, then running the reverse as needed. The system is a web services that utilizes Actix. In practice things are working fine.

However, when I ran a load generating program wrk against a sample call, the memory utilization of the cache increases considerably and does not appear to respect the max_capacity() setting. It does not decrease after load generation completes and subsequent wrk tests continue to increase the memory footprint likely indefinitely.

As an example:

lazy_static! {
    static ref MOKA_CACHE: MCache<Uuid, CacheItem> = MCache::builder()
        .weigher(|_key, value: &CacheItem| -> u32 {
            //let size = size_of_val(&*value.data) as u32;
            let size = value.data.len().try_into().unwrap_or(u32::MAX);
            size
        })
        .max_capacity(
                 256
                * 1024
                * 1024
        )
        .build();
}

1: At the start of the load generation:

Top Indicates around 15M of memory utilization:

6488 erp_service_acco 0.0 00:01.19 29 0 50 15M

2: Executing a wrk request yields

wrk -d 10 -t 8 -c 100 http://localhost:8080/v1/account/customer/test/1

Running 10s test @ http://localhost:8080/v1/account/customer/test/1
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.16ms    1.79ms  48.82ms   99.58%
    Req/Sec    11.23k     0.92k   13.45k    83.66%
  902635 requests in 10.10s, 314.20MB read
Requests/sec:  89370.39
Transfer/sec:     31.11MB

Top now indicates around 449M of memory utilization

6488 erp_service_acco 0.0 01:19.42 45 0 66 449M

3 Executing a longer wrk request again

wrk -d 100 -t 8 -c 100 http://localhost:8080/v1/account/customer/test/1
Running 2m test @ http://localhost:8080/v1/account/customer/test/1
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.97ms   12.33ms 306.14ms   92.16%
    Req/Sec     7.77k     4.35k   14.75k    71.28%
  6184606 requests in 1.67m, 2.10GB read
Requests/sec:  61818.19
Transfer/sec:     21.52MB

Top now indicates around 933M of memory utilization

6488 erp_service_acco 0.0 15:53.12 46 0 67 933M

Memory utilization continues to grow into the GB+ range. Based on the max_capacity setting above and the documentation in the readme, I would expect the footprint to be around 256MiB + Actix.

In order to remove Actix as a potential component here, I ran these same tests without any calls against my code that interacts with Moka. Actix does not grow beyond a 21MB footprint.

Data is being inserted into the cache like so:

        let encoded = bincode::serde::encode_to_vec(&cache_data, config)?;
        let cache_item = CacheItem::new(encoded, 120)?;
        cache.insert(key.clone(), cache_item).await;

and retrieved like so:

  let cache_item = match cache.get(key) {
            None => {
                /* removed for simplicity */
            }
            Some(item) => {
                /* removed for simplicity */
                item
            }
        };

        let (decoded, _len): (T, usize) =
            bincode::serde::decode_from_slice(&cache_item.data[..], config)?;

I also repeated the tests without the interaction with bincode.

Is there something I'm misunderstanding on how to restrict the memory utilization of Moka via the max_capacity() (or another setting) configuration?

Any thoughts on this would be greatly appreciated.

chriskyndrid commented 2 years ago

Here is test code that should allow you to reproduce what I assume is a memory leak somewhere:

pub type AsyncResult<T> = std::result::Result<T, Box<dyn std::error::Error + Send + Sync>>;

use std::error::Error;
use std::fmt;
use actix_web::{App, HttpResponse, HttpServer, Scope, web};
use bincode::config::Configuration;
use moka::future::Cache;
use serde::de::DeserializeOwned;
use serde::Serialize;
use uuid::Uuid;

#[macro_use]
extern crate actix_web;
#[macro_use]
extern crate lazy_static;
#[macro_use]
extern crate serde;

lazy_static! {
    static ref MOKA_CACHE: Cache<Uuid, Vec<u8>> = Cache::builder()
        .weigher(|_key, value: &Vec<u8>| -> u32 {
            value.len().try_into().unwrap_or(u32::MAX)
        })
        .max_capacity(
                256
                * 1024
                * 1024
        )
        .build();
}

#[derive(Debug)]
struct MyError(String);

impl fmt::Display for MyError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "{}", self.0)
    }
}

impl Error for MyError {}

pub fn api_config_v1(cfg: &mut web::ServiceConfig) {
    cfg.service(web::scope("/v1").service(api_config_v1_members()));
}

pub fn api_config_v1_members() -> Scope {
    web::scope("/test").service(
        web::scope("/moka")
            .service(leak)
    )
}

#[actix_web::main]
async fn main() -> AsyncResult<()> {
    let app = HttpServer::new(move || {
        App::new()
            .configure(api_config_v1)
    }).bind("127.0.0.1:8080")?;

    app.run().await?;

    Ok(())
}

#[get("/leak")]
pub async fn leak() -> HttpResponse {
    let cache_item = TestStruct {
        name: "Some name".to_owned()
    };
    let id = Uuid::new_v4();
    set_data::<TestStruct>(&id, &cache_item).await;
    get_data::<TestStruct>(&id).await;

    let response = TestResponse {
        complete: true
    };

    HttpResponse::Ok()
        .content_type("application/json")
        .body(serde_json::to_string(&response).unwrap())
}

#[derive(Eq, PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct TestStruct {
    name: String,
}

#[derive(Serialize)]
pub struct TestResponse {
    complete: bool,
}

pub async fn set_data<T>(key: &Uuid, cache_data: &T) -> AsyncResult<bool>
    where
        T: Send + Sync + Serialize,
{
    let config = Configuration::standard();
    let cache = MOKA_CACHE.clone();
    let encoded = bincode::serde::encode_to_vec(&cache_data, config)?;
    cache.insert(key.clone(), encoded).await;
    Ok(true)
}

pub async fn get_data<T>(key: &Uuid) -> AsyncResult<T>
    where
        T: Send + Sync + DeserializeOwned,
{
    let config = Configuration::standard();
    let cache = MOKA_CACHE.clone();
    let cache_item = match cache.get(key) {
        None => {
            return Err(Box::new(MyError("Cache item not found".into())));
        }
        Some(item) => {
            item
        }
    };

    let (decoded, _len): (T, usize) =
        bincode::serde::decode_from_slice(&cache_item[..], config)?;

    Ok(decoded)
}

and cargo.toml

[package]
name = "moka_memory_leak_example"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
actix-web = {version = "4.0.0-beta.20", features=["rustls"]}

[dependencies.moka]
version = "0.7.1"
features = ["future"]

[dependencies.lazy_static]
version = "1.4.0"

[dependencies.bincode]
features = ["serde"]
git = "https://github.com/bincode-org/bincode"

[dependencies.serde]
version = "1"
features = ["derive", "serde_derive"]

[dependencies.serde_json]
version = "1"

[dependencies.uuid]
version = "0.8"
features = ["serde", "v4", "v5"]

[dependencies.tokio]
version = "1.15.0"
features = ["full"]

and for generating the work, you can use wrk

wrk -d 100 -t 8 -c 100 http://localhost:8080/v1/test/moka/leak

On my machine this results in 2.53GB memory utilization.

tatsuya6502 commented 2 years ago

Hi. Thank you for reporting the issue with the test program. I ran the program and generated loads by running wrk for 30 minutes. I confirmed the issue and will investigate further.

Here are what I have done so far:

Test 1:

Ran wrk for 30 minutes and recorded the memory consumption of the program by running top command every 15 seconds. Found that the RES value stopped growing after it reached to 12.5GiB.

Result 1: (Click to expand)

```console $ head -7 top.log top - 19:36:57 up 1 day, 12:06, 2 users, load average: 0.81, 0.45, 0.38 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.6 us, 0.0 sy, 0.0 ni, 98.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 32003.3 total, 25387.5 free, 5327.9 used, 1288.0 buff/cache MiB Swap: 16384.0 total, 16384.0 free, 0.0 used. 26103.0 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND $ rg -N gh top.log 1127142 tatsuya 20 0 210788 2992 2484 S 0.0 0.0 0:00.22 gh72 1127142 tatsuya 20 0 210788 2992 2484 S 0.0 0.0 0:00.22 gh72 1127142 tatsuya 20 0 489784 128980 2824 S 45.7 0.4 0:07.09 gh72 1127142 tatsuya 20 0 1012516 667972 2824 S 202.6 2.0 0:37.50 gh72 1127142 tatsuya 20 0 1400896 1.1g 2824 S 203.1 3.6 1:08.00 gh72 1127142 tatsuya 20 0 2189380 1.7g 2824 S 203.0 5.4 1:38.47 gh72 1127142 tatsuya 20 0 2582596 2.2g 2824 S 203.2 7.0 2:08.99 gh72 1127142 tatsuya 20 0 3041348 2.6g 2824 S 203.4 8.5 2:39.52 gh72 1127142 tatsuya 20 0 3631164 3.1g 2824 S 203.1 10.0 3:10.03 gh72 1127142 tatsuya 20 0 4024380 3.6g 2824 S 203.4 11.5 3:40.56 gh72 1127142 tatsuya 20 0 4483132 4.1g 2824 S 203.2 13.1 4:11.08 gh72 1127142 tatsuya 20 0 5335104 4.8g 2824 S 205.4 15.4 4:41.91 gh72 1127142 tatsuya 20 0 5793856 5.3g 2824 S 205.7 16.9 5:12.81 gh72 1127142 tatsuya 20 0 6522948 5.9g 2824 S 205.0 18.9 5:43.58 gh72 1127142 tatsuya 20 0 7571524 6.8g 2824 S 207.9 21.9 6:14.81 gh72 1127142 tatsuya 20 0 7964740 7.3g 2824 S 206.0 23.5 6:45.73 gh72 1127142 tatsuya 20 0 8620100 7.8g 2824 S 205.9 25.0 7:16.65 gh72 1127142 tatsuya 20 0 9078852 8.3g 2824 S 205.7 26.6 7:47.53 gh72 1127142 tatsuya 20 0 9472068 8.8g 2824 S 206.0 28.1 8:18.47 gh72 1127142 tatsuya 20 0 9826.1m 9.3g 2824 S 205.7 29.6 8:49.35 gh72 1127142 tatsuya 20 0 9.9g 9.6g 2828 S 210.3 30.6 9:20.94 gh72 1127142 tatsuya 20 0 10.0g 9.6g 2828 S 216.0 30.7 9:53.36 gh72 1127142 tatsuya 20 0 10.0g 9.7g 2828 S 216.2 31.0 10:25.84 gh72 1127142 tatsuya 20 0 10.2g 9.8g 2828 S 215.7 31.3 10:58.21 gh72 1127142 tatsuya 20 0 10.2g 9.9g 2828 S 215.6 31.6 11:30.59 gh72 1127142 tatsuya 20 0 10.3g 10.0g 2828 S 216.9 31.9 12:03.14 gh72 1127142 tatsuya 20 0 10.4g 10.1g 2828 S 216.6 32.2 12:35.68 gh72 1127142 tatsuya 20 0 10.5g 10.2g 2828 S 216.5 32.6 13:08.18 gh72 1127142 tatsuya 20 0 10.7g 10.3g 2828 S 216.6 33.0 13:40.72 gh72 1127142 tatsuya 20 0 10.8g 10.4g 2828 S 216.0 33.4 14:13.14 gh72 1127142 tatsuya 20 0 10.9g 10.6g 2828 S 216.1 33.8 14:45.60 gh72 1127142 tatsuya 20 0 11.0g 10.7g 2828 S 217.4 34.2 15:18.25 gh72 1127142 tatsuya 20 0 11.2g 10.8g 2828 S 216.7 34.6 15:50.78 gh72 1127142 tatsuya 20 0 11.3g 10.9g 2828 S 216.7 34.9 16:23.33 gh72 1127142 tatsuya 20 0 11.4g 11.0g 2828 S 217.1 35.3 16:55.92 gh72 1127142 tatsuya 20 0 11.5g 11.2g 2828 S 217.5 35.7 17:28.59 gh72 1127142 tatsuya 20 0 11.7g 11.3g 2828 S 217.6 36.1 18:01.25 gh72 1127142 tatsuya 20 0 11.7g 11.4g 2828 S 217.4 36.4 18:33.91 gh72 1127142 tatsuya 20 0 11.8g 11.5g 2828 S 217.7 36.8 19:06.59 gh72 1127142 tatsuya 20 0 12.0g 11.6g 2828 S 218.8 37.2 19:39.46 gh72 1127142 tatsuya 20 0 12.0g 11.7g 2828 S 219.8 37.5 20:12.45 gh72 1127142 tatsuya 20 0 12.2g 11.8g 2828 S 220.9 37.8 20:45.63 gh72 1127142 tatsuya 20 0 12.3g 11.9g 2828 S 224.1 38.1 21:19.26 gh72 1127142 tatsuya 20 0 12.3g 12.0g 2828 S 235.0 38.4 21:54.55 gh72 1127142 tatsuya 20 0 12.4g 12.1g 2828 S 241.2 38.6 22:30.75 gh72 1127142 tatsuya 20 0 12.5g 12.1g 2828 S 246.9 38.8 23:07.84 gh72 1127142 tatsuya 20 0 12.5g 12.2g 2828 S 253.7 39.0 23:45.92 gh72 1127142 tatsuya 20 0 12.6g 12.2g 2828 S 260.4 39.2 24:25.03 gh72 1127142 tatsuya 20 0 12.7g 12.3g 2828 S 267.0 39.3 25:05.10 gh72 1127142 tatsuya 20 0 12.7g 12.3g 2828 S 271.1 39.4 25:45.82 gh72 1127142 tatsuya 20 0 12.7g 12.3g 2828 S 275.0 39.4 26:27.10 gh72 1127142 tatsuya 20 0 12.7g 12.3g 2828 S 277.5 39.5 27:08.78 gh72 1127142 tatsuya 20 0 12.7g 12.4g 2828 S 279.3 39.5 27:50.71 gh72 1127142 tatsuya 20 0 12.7g 12.4g 2828 S 280.7 39.6 28:32.87 gh72 1127142 tatsuya 20 0 12.7g 12.4g 2828 S 282.1 39.6 29:15.22 gh72 1127142 tatsuya 20 0 12.7g 12.4g 2828 S 282.6 39.6 29:57.66 gh72 1127142 tatsuya 20 0 12.7g 12.4g 2828 S 283.5 39.6 30:40.22 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 283.8 39.7 31:22.84 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 284.3 39.7 32:05.52 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 284.4 39.7 32:48.23 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 285.0 39.7 33:31.01 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 285.1 39.7 34:13.83 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 285.5 39.7 34:56.69 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 285.7 39.7 35:39.60 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.5 39.8 36:22.60 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 285.7 39.8 37:05.51 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.5 39.8 37:48.52 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 285.5 39.8 38:31.40 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.7 39.8 39:14.44 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 284.5 39.8 39:57.17 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.2 39.8 40:40.13 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 287.3 39.8 41:23.28 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 287.1 39.8 42:06.38 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.8 39.8 42:49.46 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.7 39.8 43:32.50 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.4 39.8 44:15.51 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 287.3 39.8 44:58.63 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 287.0 39.8 45:41.74 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.9 39.8 46:24.80 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.8 39.8 47:07.88 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.9 39.8 47:50.95 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 287.7 39.8 48:34.17 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 287.3 39.8 49:17.30 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.1 39.8 50:00.27 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 287.2 39.8 50:43.38 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 288.1 39.8 51:26.65 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 286.0 39.8 52:09.58 gh72 1127142 tatsuya 20 0 12.8g 12.4g 2828 S 287.7 39.8 52:52.79 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.1 39.8 53:35.89 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 286.6 39.8 54:18.94 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.9 39.8 55:02.15 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 284.8 39.8 55:44.92 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.0 39.8 56:28.00 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.9 39.8 57:11.24 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.1 39.8 57:54.33 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 286.0 39.8 58:37.28 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.5 39.8 59:20.44 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 286.8 39.9 60:03.51 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.5 39.9 60:46.66 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 286.1 39.9 61:29.63 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 288.5 39.9 62:12.94 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.1 39.9 62:56.06 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 284.4 39.9 63:38.75 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 288.1 39.9 64:22.03 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 288.7 39.9 65:05.37 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 283.8 39.9 65:48.00 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.8 39.9 66:31.20 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.2 39.9 67:14.34 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 288.1 39.9 67:57.58 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 286.3 39.9 68:40.58 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.5 39.9 69:23.74 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 286.4 39.9 70:06.76 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 288.7 39.9 70:50.09 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 285.8 39.9 71:33.02 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 291.0 39.9 72:16.70 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 282.4 39.9 72:59.12 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.5 39.9 73:42.28 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 286.0 39.9 74:25.23 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.9 39.9 75:08.45 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.9 39.9 75:51.69 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 288.0 39.9 76:34.92 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 287.9 39.9 77:18.16 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 208.6 39.9 77:49.47 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 0.0 39.9 77:49.47 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 0.1 39.9 77:49.49 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 0.1 39.9 77:49.50 gh72 1127142 tatsuya 20 0 12.8g 12.5g 2828 S 0.0 39.9 77:49.50 gh72 ```

Test 2:

Ran wrk again for 30 minutes, but for this time, Changed the cache_item from this:

    let cache_item = TestStruct {
        // name.len() is 9 bytes.
        name: "Some name".to_owned()
    };

to this:

    let cache_item = TestStruct {
        // name.len() is now 90 bytes.
        name: String::from_utf8_lossy(&[b'a'; 90]).into_owned()
    };

The RES value (maybe) stopped growing after it reached to 5.7GiB.

Result 2: (Click to expand)

```console $ head -7 top2.log top - 20:50:15 up 1 day, 13:20, 2 users, load average: 0.35, 0.26, 0.38 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 1.7 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 32003.3 total, 25387.0 free, 5329.9 used, 1286.4 buff/cache MiB Swap: 16384.0 total, 16384.0 free, 0.0 used. 26102.7 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND $ rg -N gh top2.log 1139173 tatsuya 20 0 210792 3036 2532 S 0.0 0.0 0:00.19 gh72 1139173 tatsuya 20 0 210792 3036 2532 S 0.0 0.0 0:00.19 gh72 1139173 tatsuya 20 0 614260 175548 2960 S 54.2 0.5 0:08.32 gh72 1139173 tatsuya 20 0 1173884 845576 2960 S 203.2 2.6 0:38.84 gh72 1139173 tatsuya 20 0 1596484 1.3g 2964 S 208.7 4.1 1:10.16 gh72 1139173 tatsuya 20 0 1727556 1.4g 2964 S 214.4 4.6 1:42.37 gh72 1139173 tatsuya 20 0 1858628 1.5g 2964 S 215.9 4.9 2:14.77 gh72 1139173 tatsuya 20 0 1989700 1.6g 2964 S 219.2 5.3 2:47.70 gh72 1139173 tatsuya 20 0 1989700 1.7g 2964 S 262.8 5.4 3:27.15 gh72 1139173 tatsuya 20 0 1989700 1.7g 2964 S 279.1 5.5 4:09.07 gh72 1139173 tatsuya 20 0 1989700 1.7g 2964 S 281.4 5.5 4:51.31 gh72 1139173 tatsuya 20 0 1989700 1.7g 2964 S 280.4 5.5 5:33.43 gh72 1139173 tatsuya 20 0 1995856 1.7g 2964 S 282.1 5.5 6:15.78 gh72 1139173 tatsuya 20 0 2008168 1.8g 2964 S 280.2 5.6 6:57.87 gh72 1139173 tatsuya 20 0 2161812 1.8g 2964 S 281.1 5.8 7:40.06 gh72 1139173 tatsuya 20 0 2183356 1.8g 2964 S 278.6 5.9 8:21.90 gh72 1139173 tatsuya 20 0 2210032 1.9g 2964 S 279.3 6.1 9:03.82 gh72 1139173 tatsuya 20 0 2314556 2.0g 2964 S 274.0 6.3 9:44.97 gh72 1139173 tatsuya 20 0 2249020 1.9g 2964 S 217.5 6.0 10:17.61 gh72 1139173 tatsuya 20 0 2249020 1.9g 2964 S 215.9 6.1 10:50.04 gh72 1139173 tatsuya 20 0 2380092 2.1g 2964 S 216.0 6.7 11:22.46 gh72 1139173 tatsuya 20 0 2642236 2.3g 2964 S 215.0 7.2 11:54.75 gh72 1139173 tatsuya 20 0 2773308 2.4g 2964 S 216.1 7.8 12:27.18 gh72 1139173 tatsuya 20 0 3035452 2.6g 2964 S 216.8 8.4 12:59.74 gh72 1139173 tatsuya 20 0 3166524 2.8g 2964 S 219.5 8.9 13:32.69 gh72 1139173 tatsuya 20 0 3297596 2.9g 2964 S 225.4 9.4 14:06.54 gh72 1139173 tatsuya 20 0 3428668 3.1g 2964 S 241.8 9.8 14:42.83 gh72 1139173 tatsuya 20 0 3494204 3.1g 2964 S 264.0 10.0 15:22.48 gh72 1139173 tatsuya 20 0 3559740 3.1g 2964 S 279.1 10.1 16:04.38 gh72 1139173 tatsuya 20 0 3559740 3.2g 2964 S 283.8 10.1 16:47.00 gh72 1139173 tatsuya 20 0 3559740 3.2g 2964 S 285.3 10.1 17:29.83 gh72 1139173 tatsuya 20 0 3559740 3.2g 2964 S 285.4 10.1 18:12.69 gh72 1139173 tatsuya 20 0 3559740 3.2g 2964 S 286.0 10.1 18:55.62 gh72 1139173 tatsuya 20 0 3559740 3.2g 2964 S 286.9 10.2 19:38.71 gh72 1139173 tatsuya 20 0 3559740 3.2g 2964 S 286.2 10.2 20:21.67 gh72 1139173 tatsuya 20 0 3559740 3.2g 2964 S 287.1 10.2 21:04.79 gh72 1139173 tatsuya 20 0 3559740 3.2g 2964 S 286.7 10.2 21:47.82 gh72 1139173 tatsuya 20 0 3563840 3.2g 2964 S 286.3 10.2 22:30.82 gh72 1139173 tatsuya 20 0 3572040 3.2g 2964 S 285.5 10.3 23:13.68 gh72 1139173 tatsuya 20 0 3569988 3.2g 2964 S 287.3 10.3 23:56.83 gh72 1139173 tatsuya 20 0 3582288 3.2g 2964 S 285.3 10.4 24:39.65 gh72 1139173 tatsuya 20 0 3588436 3.3g 2964 S 286.1 10.4 25:22.62 gh72 1139173 tatsuya 20 0 3662172 3.3g 2964 S 285.9 10.5 26:05.54 gh72 1139173 tatsuya 20 0 3662172 3.3g 2964 S 283.4 10.5 26:48.11 gh72 1139173 tatsuya 20 0 3731808 3.3g 2964 S 284.3 10.5 27:30.78 gh72 1139173 tatsuya 20 0 3740008 3.3g 2964 S 285.4 10.6 28:13.65 gh72 1139173 tatsuya 20 0 3752308 3.3g 2964 S 285.4 10.7 28:56.49 gh72 1139173 tatsuya 20 0 3760508 3.3g 2964 S 286.0 10.7 29:39.44 gh72 1139173 tatsuya 20 0 3760508 3.4g 2964 S 285.2 10.7 30:22.25 gh72 1139173 tatsuya 20 0 3781004 3.4g 2964 S 286.0 10.9 31:05.21 gh72 1139173 tatsuya 20 0 3785104 3.4g 2964 S 286.3 10.9 31:48.18 gh72 1139173 tatsuya 20 0 3793304 3.4g 2964 S 285.0 11.0 32:30.99 gh72 1139173 tatsuya 20 0 3864988 3.5g 2964 S 285.0 11.1 33:13.77 gh72 1139173 tatsuya 20 0 3881388 3.5g 2964 S 284.7 11.2 33:56.53 gh72 1139173 tatsuya 20 0 3967424 3.6g 2964 S 284.4 11.4 34:39.22 gh72 1139173 tatsuya 20 0 3981772 3.6g 2964 S 286.8 11.5 35:22.30 gh72 1139173 tatsuya 20 0 3987920 3.6g 2964 S 284.7 11.5 36:05.04 gh72 1139173 tatsuya 20 0 4008420 3.7g 2964 S 284.3 11.7 36:47.74 gh72 1139173 tatsuya 20 0 4088304 3.7g 2964 S 285.3 11.8 37:30.57 gh72 1139173 tatsuya 20 0 4164088 3.7g 2964 S 287.2 11.9 38:13.70 gh72 1139173 tatsuya 20 0 4180484 3.8g 2964 S 283.7 12.1 38:56.29 gh72 1139173 tatsuya 20 0 4131224 3.8g 2964 S 258.5 12.0 39:35.12 gh72 1139173 tatsuya 20 0 3823936 3.4g 2964 S 218.2 10.8 40:07.87 gh72 1139173 tatsuya 20 0 3823936 3.4g 2964 S 216.5 10.8 40:40.39 gh72 1139173 tatsuya 20 0 3823936 3.4g 2964 S 215.0 10.8 41:12.66 gh72 1139173 tatsuya 20 0 3823936 3.4g 2964 S 215.0 10.8 41:44.96 gh72 1139173 tatsuya 20 0 3823936 3.4g 2964 S 214.6 10.8 42:17.20 gh72 1139173 tatsuya 20 0 3823936 3.4g 2964 S 215.0 10.8 42:49.47 gh72 1139173 tatsuya 20 0 3823936 3.4g 2964 S 214.8 10.8 43:21.74 gh72 1139173 tatsuya 20 0 3955008 3.5g 2964 S 216.5 11.4 43:54.24 gh72 1139173 tatsuya 20 0 4086080 3.7g 2964 S 215.1 11.9 44:26.55 gh72 1139173 tatsuya 20 0 4348224 3.9g 2964 S 215.8 12.5 44:58.94 gh72 1139173 tatsuya 20 0 4479296 4.1g 2964 S 216.0 13.1 45:31.38 gh72 1139173 tatsuya 20 0 4741440 4.3g 2964 S 215.9 13.6 46:03.78 gh72 1139173 tatsuya 20 0 4872512 4.4g 2964 S 216.6 14.2 46:36.32 gh72 1139173 tatsuya 20 0 5003584 4.6g 2964 S 218.1 14.7 47:09.06 gh72 1139173 tatsuya 20 0 5265728 4.8g 2964 S 219.5 15.3 47:42.03 gh72 1139173 tatsuya 20 0 5396800 4.9g 2964 S 221.7 15.8 48:15.30 gh72 1139173 tatsuya 20 0 5527872 5.1g 2964 S 223.4 16.3 48:48.86 gh72 1139173 tatsuya 20 0 5658944 5.2g 2964 S 229.4 16.7 49:23.29 gh72 1139173 tatsuya 20 0 5790016 5.3g 2964 S 236.4 17.0 49:58.79 gh72 1139173 tatsuya 20 0 5921088 5.4g 2964 S 249.4 17.3 50:36.23 gh72 1139173 tatsuya 20 0 5921088 5.5g 2964 S 261.9 17.5 51:15.56 gh72 1139173 tatsuya 20 0 6052160 5.5g 2964 S 271.1 17.6 51:56.28 gh72 1139173 tatsuya 20 0 6052160 5.5g 2964 S 277.3 17.7 52:37.90 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 280.6 17.8 53:20.04 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 282.0 17.8 54:02.37 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 282.0 17.9 54:44.72 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 283.7 17.9 55:27.30 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 283.4 17.9 56:09.86 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 285.2 17.9 56:52.67 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 287.0 17.9 57:35.77 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 286.1 17.9 58:18.72 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 285.3 17.9 59:01.57 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 285.4 17.9 59:44.41 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 285.1 18.0 60:27.23 gh72 1139173 tatsuya 20 0 6052160 5.6g 2964 S 288.0 18.0 61:10.46 gh72 1139173 tatsuya 20 0 6117696 5.6g 2964 S 288.3 18.0 61:53.77 gh72 1139173 tatsuya 20 0 6117696 5.6g 2964 S 284.8 18.0 62:36.52 gh72 1139173 tatsuya 20 0 6117696 5.6g 2964 S 287.4 18.0 63:19.68 gh72 1139173 tatsuya 20 0 6117696 5.6g 2964 S 287.2 18.0 64:02.79 gh72 1139173 tatsuya 20 0 6117696 5.6g 2964 S 286.8 18.0 64:45.87 gh72 1139173 tatsuya 20 0 6117696 5.6g 2964 S 288.6 18.0 65:29.19 gh72 1139173 tatsuya 20 0 6117696 5.6g 2964 S 284.6 18.0 66:11.93 gh72 1139173 tatsuya 20 0 6117696 5.6g 2964 S 288.8 18.0 66:55.28 gh72 1139173 tatsuya 20 0 6191428 5.6g 2964 S 288.0 18.1 67:38.54 gh72 1139173 tatsuya 20 0 6191428 5.6g 2964 S 284.0 18.1 68:21.17 gh72 1139173 tatsuya 20 0 6191428 5.6g 2964 S 288.7 18.1 69:04.54 gh72 1139173 tatsuya 20 0 6191428 5.6g 2964 S 288.7 18.1 69:47.87 gh72 1139173 tatsuya 20 0 6199624 5.7g 2964 S 286.6 18.1 70:30.92 gh72 1139173 tatsuya 20 0 6195524 5.7g 2964 S 287.3 18.1 71:14.04 gh72 1139173 tatsuya 20 0 6195524 5.7g 2964 S 284.6 18.1 71:56.78 gh72 1139173 tatsuya 20 0 6195524 5.7g 2964 S 282.2 18.1 72:39.14 gh72 1139173 tatsuya 20 0 6195524 5.7g 2964 S 290.3 18.1 73:22.75 gh72 1139173 tatsuya 20 0 6195524 5.7g 2964 S 286.3 18.1 74:05.72 gh72 1139173 tatsuya 20 0 6203720 5.7g 2964 S 283.8 18.2 74:48.35 gh72 1139173 tatsuya 20 0 6203720 5.7g 2964 S 289.5 18.2 75:31.81 gh72 1139173 tatsuya 20 0 6203720 5.7g 2964 S 285.9 18.2 76:14.75 gh72 1139173 tatsuya 20 0 6199620 5.7g 2964 S 282.0 18.2 76:57.08 gh72 1139173 tatsuya 20 0 6207816 5.7g 2964 S 283.9 18.3 77:39.72 gh72 1139173 tatsuya 20 0 6216012 5.7g 2964 S 286.1 18.3 78:22.66 gh72 1139173 tatsuya 20 0 6216012 5.7g 2964 S 287.8 18.3 79:05.89 gh72 1139173 tatsuya 20 0 6216012 5.7g 2964 S 194.2 18.3 79:35.04 gh72 1139173 tatsuya 20 0 6216012 5.7g 2964 S 0.1 18.3 79:35.05 gh72 1139173 tatsuya 20 0 6216012 5.7g 2964 S 0.1 18.3 79:35.06 gh72 ```

Note

Please note that the cache stores both key and value in the internal concurrent hash map, so it is expected that the cache will consume more memory than 256GiB in your program. (But not 12.5GiB)

The weigher returns value.len(), which is 10 bytes. So the max_capacity 256MiB means the cache will store up to ~27 million entries.

$ julia -q

# value.len() is 10 bytes.
julia> num_entries = 256 * 1024^2 / 10
2.68435456e7

julia> using Printf

julia> @printf("%.2f", num_entries)
26843545.60

However, each entry will be 50 bytes (or larger).

# key (UUID):
#    16 bytes
# value (Vec<u8>):
#    data: 10 bytes (or more for pre-allocated extra capacity)
#    ptr:   8 bytes 
#    cap:   8 bytes (usize)
#    len:   8 bytes (usize)

julia> entry_size = 16 + 10 + 8 * 3
50

So ~27 million entries will consume 1.25GiB or more.

julia> entry_size * num_entries / 1024^3
1.25

chriskyndrid commented 2 years ago

Hi Tatsuya,

Thank you for the thorough information, I appreciate it, and your work on Moka and in resolving issues like this.

The approximations you list in the latter half of your response is helpful for approximating the total size of an entry. I wasn't thinking about the key size, and didn't do any further analysis than the Vec<u8> len(). I'll use the information you provided in my planning moving forward.

Thanks again! I'll continue to follow the issue.

tatsuya6502 commented 2 years ago

Hi @chriskyndrid,

I made some progress.

To investigate the issue, I modified the source codes of Moka and moka-cht (concurrent hash table) to collect and expose some statistics of the internal data structures. I also modified your code to print out these statistics every 15 seconds.

I published the entire project here:

https://gitlab.com/moka-labs/moka-gh72

(If you are curious, you can try running it by following the README of gh72 repository)

A sample result

Here is a result of running the gh72 program.

gh72-result-2022-01-23

To make the table smaller, I edited the source code and reduced the max capacity of the cache to 8MiB.

I ran the followings:

wrk -d 5 -t 8 -c 10 http://localhost:8080/v1/test/moka/leak (5 seconds)
curl http://localhost:8080/v1/test/moka/clear, which invalidates all cached entries.
wrk -d 30 -t 8 -c 10 http://localhost:8080/v1/test/moka/leak. (30 seconds)
curl http://localhost:8080/v1/test/moka/clear again.

For the descriptions of the columns, please see the README of gh72 repository.

Observations

I found the followings:

The current internal data structures are not optimal in term of memory usage. This is causing the excessive memory usage.
- It will get worse when:
  - The cache is configured to store millions of entries.
  - AND, the key has high cardinality. (The key has many possible values)
There is no sign of memory leak. All internal data structures I checked were dropped when possible.

The memory overhead of the internal data structures

For 1., there are some areas to improve. I created a diagram of the internal structures to help us to understand (including myself):

Moka-mem-overhead-v0 7 1-2022-01-23

The key K (1) and value V (2) are the data that user wants to store.
All other data structurs are used to realize cache's main features with amotized O(1) time complexity.
- The main structures:
  - A concurrent hash map, which allocates extra capacity to deal with entry insertions.
  - Two queues (double-ended queues), implemented by doubly linked list.
  - (Not in the diagram) A frequency sketch (population estimator, or LFU filter)
The numbers under each box (e.g. +8) indicate the memory overhead, the additional space needed for each entry on 64-bit platform (in bytes).
- In GH Issue 72's case, write order queue is disabled, and the overhead per entry is 192 bytes.

Areas to improve

Here is a non-exhaustive list of areas where we could improve:

a. Eliminate the weak_counts:
- The three boxes labeled ArcInner uses std::sync::Arc. They have weak_count filed, which takes 8 bytes each. However we do not use it.
- Savings: 8 3 ~27 million entries = 614MiB.
- We could use triomphe::Arc from triomphe crate as a drop-in replacement of the std Arc.
b. Eliminate the enum tags:
- Stop using enum for EntryInfo.
- Savings: 8 3 ~27 million entries = 614MiB.
c. Eliminate the region fields:
- Savings: 8 2 ~27 million entries = ~410MiB.
d. Fix a bug that the LFU filter seems twice as big as necessary.
- For tracking ~27 milion * 10 entries (~270 million entries), I think the filter should only take 256MiB of memory, but is actually taking 512MiB.
- This bug only exists when size-aware eviction is used.
e. Enhance moka-cht Hash Map to drop Bucket and Key for a removed entry as soon as possible:
- moka-cht doc — Implementation details
  - removing an entry from the hash table results in that bucket pointer having a tombstone bit set. Insertions cannot displace a tombstone bucket unless their key compares equal, so once an entry is inserted into the hash table, the specific index it is assigned to will only ever hold entries whose keys compare equal. Without this restriction, resizing operations could result in the old and new bucket arrays being temporarily inconsistent.
  - ... Tombstone bucket pointers are typically not copied into new bucket arrays.
- You can confirm this behavior by running the moka-gh72 program. (See a sample result above)

Conclusion

By modifying the test program to track the statistics on the internal data structure, I found some areas to improve.

I will create a plan for improve these areas. a. to d. look straightforward. e. will need some thinking.

chriskyndrid commented 2 years ago

Tatsuya,

This is an excellent synopsis of your investigation. I'm glad the issue turned out to not be a memory leak. It's clear you have a thorough understanding of your design and the intention behind the design. It's also clear you've identified some concrete areas to reduce egregious memory utilization and I'll look forward to monitoring progress on those efforts. Again, thank you for your work on Moka and focus on improvement.

The primary motivation I have for using Moka is to offload common requests that require somewhat expensive network I/O, but don't change particularly frequently. Because of the architecture of the system I'm designing, memory is somewhat constrained on various service worker nodes, so controlling the maximal cache memory utilization is important to not negatively impact the efficacy of the overall worker and it's other tasks. The workers do not have multi GB amounts of memory to allocate for the purpose of caching, so landing on an appropriate functional max_capacity is important to me. I'm really less concerned about entries, and more concerned about memory footprint....

At any rate, thanks again, and let me know when you have made some of these changes. I'll run a bunch of tests in our codebase.

Chris

tatsuya6502 commented 2 years ago

Hi @chriskyndrid,

OK. I pushed some changes for the following items to the master branch:

a. Eliminated the weak_counts.
b. Eliminated the enum tags.
d. Fixed a bug that the LFU filter seems twice as big as necessary.
e. Enhanced moka-cht hash map to drop Bucket and Key for a removed entry as soon as possible.

I ran the gh72 program against two different versions of Moka and recorded the memory utilization.

moka-gh72-mem-2022-0202

Moka versions:

Blue line: Moka v0.7.1
Green line: Moka master branch (v0.7.2-devel)

Axes:

X-axis: Time in seconds.
Y-axis: Allocated memory for the application (in GiB).

Areas/Points:

Area A (10 minutes): Continued inserting entries with non-duplicate keys, using the following command:
- wrk -d 600 -t 8 -c 100 http://localhost:8080/v1/test/moka/leak
Point B: Triggered invalidate_all, using the following command:
- curl http://localhost:8080/v1/test/moka/clear
Area C (10 minutes): Continued inserting entries again.
Point D: Triggered invalidate_all again.

Notes:

I changed the max capacity from 256MiB to 64MiB. So the cache could store max 6,710,886 entries.
The raw data for each entry was 50 bytes.
- 50 bytes * 6,710,886 entries ~= 320MiB.

Results:

Moka v0.7.1 (blue) kept the most of Buckets and Keys for removed entries. So memory utilization continued growing as they built up.
Moka v0.7.2-devel (green)
- was able to drop Buckets and Keys for removed entries within half minute. So memory utilization was kept minimum.
- had ~24% smaller per-entry memory overhead than v0.7.1.
  - 1.51GiB vs 1.88GiB, both including the raw data 320MiB => 1 - (1.51 - 0.31) / (1.88 - 0.31) => 23.57%

I hope you are happy with v0.7.2-devel's result.

You can try v0.7.2-devel by putting the following line to your Cargo.toml.

[dependencies]
moka = { git = "https://github.com/moka-rs/moka", branch = "master", features = ["future"] }

I will start to prepare for v0.7.2 release. It will take few more days. (v0.7.2 milestone)

tatsuya6502 commented 2 years ago

Released v0.7.2 with these improvements. Also updated the diagram of the internal data structure:

moka-mem-overhead-v0 7 2-2022-02-06

chriskyndrid commented 2 years ago

Thank you for your work on this!

moka-rs / moka