Closed chriskyndrid closed 2 years ago
Here is test code that should allow you to reproduce what I assume is a memory leak somewhere:
pub type AsyncResult<T> = std::result::Result<T, Box<dyn std::error::Error + Send + Sync>>;
use std::error::Error;
use std::fmt;
use actix_web::{App, HttpResponse, HttpServer, Scope, web};
use bincode::config::Configuration;
use moka::future::Cache;
use serde::de::DeserializeOwned;
use serde::Serialize;
use uuid::Uuid;
#[macro_use]
extern crate actix_web;
#[macro_use]
extern crate lazy_static;
#[macro_use]
extern crate serde;
lazy_static! {
static ref MOKA_CACHE: Cache<Uuid, Vec<u8>> = Cache::builder()
.weigher(|_key, value: &Vec<u8>| -> u32 {
value.len().try_into().unwrap_or(u32::MAX)
})
.max_capacity(
256
* 1024
* 1024
)
.build();
}
#[derive(Debug)]
struct MyError(String);
impl fmt::Display for MyError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", self.0)
}
}
impl Error for MyError {}
pub fn api_config_v1(cfg: &mut web::ServiceConfig) {
cfg.service(web::scope("/v1").service(api_config_v1_members()));
}
pub fn api_config_v1_members() -> Scope {
web::scope("/test").service(
web::scope("/moka")
.service(leak)
)
}
#[actix_web::main]
async fn main() -> AsyncResult<()> {
let app = HttpServer::new(move || {
App::new()
.configure(api_config_v1)
}).bind("127.0.0.1:8080")?;
app.run().await?;
Ok(())
}
#[get("/leak")]
pub async fn leak() -> HttpResponse {
let cache_item = TestStruct {
name: "Some name".to_owned()
};
let id = Uuid::new_v4();
set_data::<TestStruct>(&id, &cache_item).await;
get_data::<TestStruct>(&id).await;
let response = TestResponse {
complete: true
};
HttpResponse::Ok()
.content_type("application/json")
.body(serde_json::to_string(&response).unwrap())
}
#[derive(Eq, PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct TestStruct {
name: String,
}
#[derive(Serialize)]
pub struct TestResponse {
complete: bool,
}
pub async fn set_data<T>(key: &Uuid, cache_data: &T) -> AsyncResult<bool>
where
T: Send + Sync + Serialize,
{
let config = Configuration::standard();
let cache = MOKA_CACHE.clone();
let encoded = bincode::serde::encode_to_vec(&cache_data, config)?;
cache.insert(key.clone(), encoded).await;
Ok(true)
}
pub async fn get_data<T>(key: &Uuid) -> AsyncResult<T>
where
T: Send + Sync + DeserializeOwned,
{
let config = Configuration::standard();
let cache = MOKA_CACHE.clone();
let cache_item = match cache.get(key) {
None => {
return Err(Box::new(MyError("Cache item not found".into())));
}
Some(item) => {
item
}
};
let (decoded, _len): (T, usize) =
bincode::serde::decode_from_slice(&cache_item[..], config)?;
Ok(decoded)
}
and cargo.toml
[package]
name = "moka_memory_leak_example"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
actix-web = {version = "4.0.0-beta.20", features=["rustls"]}
[dependencies.moka]
version = "0.7.1"
features = ["future"]
[dependencies.lazy_static]
version = "1.4.0"
[dependencies.bincode]
features = ["serde"]
git = "https://github.com/bincode-org/bincode"
[dependencies.serde]
version = "1"
features = ["derive", "serde_derive"]
[dependencies.serde_json]
version = "1"
[dependencies.uuid]
version = "0.8"
features = ["serde", "v4", "v5"]
[dependencies.tokio]
version = "1.15.0"
features = ["full"]
and for generating the work, you can use wrk
wrk -d 100 -t 8 -c 100 http://localhost:8080/v1/test/moka/leak
On my machine this results in 2.53GB memory utilization.
Hi. Thank you for reporting the issue with the test program. I ran the program and generated loads by running wrk
for 30 minutes. I confirmed the issue and will investigate further.
Here are what I have done so far:
Ran wrk
for 30 minutes and recorded the memory consumption of the program by running top
command every 15 seconds. Found that the RES
value stopped growing after it reached to 12.5GiB.
Ran wrk
again for 30 minutes, but for this time, Changed the cache_item
from this:
let cache_item = TestStruct {
// name.len() is 9 bytes.
name: "Some name".to_owned()
};
to this:
let cache_item = TestStruct {
// name.len() is now 90 bytes.
name: String::from_utf8_lossy(&[b'a'; 90]).into_owned()
};
The RES
value (maybe) stopped growing after it reached to 5.7GiB.
Please note that the cache stores both key and value in the internal concurrent hash map, so it is expected that the cache will consume more memory than 256GiB in your program. (But not 12.5GiB)
The weigher
returns value.len()
, which is 10 bytes. So the max_capacity
256MiB means the cache will store up to ~27 million entries.
$ julia -q
# value.len() is 10 bytes.
julia> num_entries = 256 * 1024^2 / 10
2.68435456e7
julia> using Printf
julia> @printf("%.2f", num_entries)
26843545.60
However, each entry will be 50 bytes (or larger).
# key (UUID):
# 16 bytes
# value (Vec<u8>):
# data: 10 bytes (or more for pre-allocated extra capacity)
# ptr: 8 bytes
# cap: 8 bytes (usize)
# len: 8 bytes (usize)
julia> entry_size = 16 + 10 + 8 * 3
50
So ~27 million entries will consume 1.25GiB or more.
julia> entry_size * num_entries / 1024^3
1.25
Hi Tatsuya,
Thank you for the thorough information, I appreciate it, and your work on Moka and in resolving issues like this.
The approximations you list in the latter half of your response is helpful for approximating the total size of an entry. I wasn't thinking about the key size, and didn't do any further analysis than the Vec<u8>
len()
. I'll use the information you provided in my planning moving forward.
Thanks again! I'll continue to follow the issue.
Hi @chriskyndrid,
I made some progress.
To investigate the issue, I modified the source codes of Moka and moka-cht (concurrent hash table) to collect and expose some statistics of the internal data structures. I also modified your code to print out these statistics every 15 seconds.
I published the entire project here:
(If you are curious, you can try running it by following the README of gh72 repository)
Here is a result of running the gh72 program.
To make the table smaller, I edited the source code and reduced the max capacity of the cache to 8MiB.
I ran the followings:
wrk -d 5 -t 8 -c 10 http://localhost:8080/v1/test/moka/leak
(5 seconds) curl http://localhost:8080/v1/test/moka/clear
, which invalidates all cached entries.wrk -d 30 -t 8 -c 10 http://localhost:8080/v1/test/moka/leak
. (30 seconds)curl http://localhost:8080/v1/test/moka/clear
again.For the descriptions of the columns, please see the README of gh72 repository.
I found the followings:
For 1., there are some areas to improve. I created a diagram of the internal structures to help us to understand (including myself):
K
(1
) and value V
(2
) are the data that user wants to store.O(1)
time complexity.
+8
) indicate the memory overhead, the additional space needed for each entry on 64-bit platform (in bytes).
Here is a non-exhaustive list of areas where we could improve:
weak_count
s:
ArcInner
uses std::sync::Arc
. They have weak_count
filed, which takes 8 bytes each. However we do not use it.triomphe::Arc
from triomphe crate as a drop-in replacement of the std Arc
.enum tag
s:
enum
for EntryInfo
.region
fields:
moka-cht
Hash Map to drop Bucket
and Key
for a removed entry as soon as possible:
removing an entry from the hash table results in that bucket pointer having a tombstone bit set. Insertions cannot displace a tombstone bucket unless their key compares equal, so once an entry is inserted into the hash table, the specific index it is assigned to will only ever hold entries whose keys compare equal. Without this restriction, resizing operations could result in the old and new bucket arrays being temporarily inconsistent.
... Tombstone bucket pointers are typically not copied into new bucket arrays.
By modifying the test program to track the statistics on the internal data structure, I found some areas to improve.
I will create a plan for improve these areas. a. to d. look straightforward. e. will need some thinking.
Tatsuya,
This is an excellent synopsis of your investigation. I'm glad the issue turned out to not be a memory leak. It's clear you have a thorough understanding of your design and the intention behind the design. It's also clear you've identified some concrete areas to reduce egregious memory utilization and I'll look forward to monitoring progress on those efforts. Again, thank you for your work on Moka and focus on improvement.
The primary motivation I have for using Moka is to offload common requests that require somewhat expensive network I/O, but don't change particularly frequently. Because of the architecture of the system I'm designing, memory is somewhat constrained on various service worker nodes, so controlling the maximal cache memory utilization is important to not negatively impact the efficacy of the overall worker and it's other tasks. The workers do not have multi GB amounts of memory to allocate for the purpose of caching, so landing on an appropriate functional max_capacity is important to me. I'm really less concerned about entries, and more concerned about memory footprint....
At any rate, thanks again, and let me know when you have made some of these changes. I'll run a bunch of tests in our codebase.
Chris
Hi @chriskyndrid,
OK. I pushed some changes for the following items to the master
branch:
weak_counts
.enum tags
.Bucket
and Key
for a removed entry as soon as possible.I ran the gh72 program against two different versions of Moka and recorded the memory utilization.
Moka versions:
master
branch (v0.7.2-devel)Axes:
Areas/Points:
wrk -d 600 -t 8 -c 100 http://localhost:8080/v1/test/moka/leak
invalidate_all
, using the following command:
curl http://localhost:8080/v1/test/moka/clear
invalidate_all
again.Notes:
Results:
Bucket
s and Key
s for removed entries. So memory utilization continued growing as they built up.Bucket
s and Key
s for removed entries within half minute. So memory utilization was kept minimum. 1 - (1.51 - 0.31) / (1.88 - 0.31)
=> 23.57%
I hope you are happy with v0.7.2-devel's result.
You can try v0.7.2-devel by putting the following line to your Cargo.toml
.
[dependencies]
moka = { git = "https://github.com/moka-rs/moka", branch = "master", features = ["future"] }
I will start to prepare for v0.7.2 release. It will take few more days. (v0.7.2 milestone)
Released v0.7.2 with these improvements. Also updated the diagram of the internal data structure:
Thank you for your work on this!
Thank you for the excellent work on this interesting project. I've implemented Moka in a project I'm working on to utilize as a memory based cache. I'm passing a struct to bincode, encoding it, and storing it the cache, then running the reverse as needed. The system is a web services that utilizes Actix. In practice things are working fine.
However, when I ran a load generating program
wrk
against a sample call, the memory utilization of the cache increases considerably and does not appear to respect themax_capacity()
setting. It does not decrease after load generation completes and subsequentwrk
tests continue to increase the memory footprint likely indefinitely.As an example:
1: At the start of the load generation:
Top Indicates around 15M of memory utilization:
6488 erp_service_acco 0.0 00:01.19 29 0 50 15M
2: Executing a
wrk
request yieldsTop now indicates around 449M of memory utilization
6488 erp_service_acco 0.0 01:19.42 45 0 66 449M
3 Executing a longer
wrk
request againTop now indicates around 933M of memory utilization
6488 erp_service_acco 0.0 15:53.12 46 0 67 933M
Memory utilization continues to grow into the GB+ range. Based on the max_capacity setting above and the documentation in the readme, I would expect the footprint to be around 256MiB + Actix.
In order to remove Actix as a potential component here, I ran these same tests without any calls against my code that interacts with Moka. Actix does not grow beyond a 21MB footprint.
Data is being inserted into the cache like so:
and retrieved like so:
I also repeated the tests without the interaction with
bincode
.Is there something I'm misunderstanding on how to restrict the memory utilization of Moka via the
max_capacity()
(or another setting) configuration?Any thoughts on this would be greatly appreciated.