tokio-rs / tokio

A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
https://tokio.rs
MIT License
27.2k stars 2.5k forks source link

Tokio Might Need a Safe Method to Refresh Runtime #6760

Open lithbitren opened 3 months ago

lithbitren commented 3 months ago

hi!

I've been following many web frameworks based on tokio, including actix, hyper, axum, warp, salvo, and others. Many of these web frameworks have issues related to memory leaks, but after investigation, I found that most of these frameworks don't actually have memory leaks. A process not immediately returning memory to the system can't be strictly defined as a memory leak in most cases.

For example, during stress testing, I used reqwest on one Ubuntu machine to perform millions of concurrent HTTP/2 requests against another Ubuntu machine running a tokio-based web service. The server-side handlers were simply designed to asynchronously sleep for 20 seconds.

On the server-side, each framework's performance is very similar. They typically accept all requests within about 4 seconds, then sleep for 20 seconds before responding to the client. Clients finish processing all responses in around 30 seconds, with an average response time from sending to receiving of about 22 seconds. All tokio-based web frameworks perform well.

However, after millions of concurrent requests, the process memory for actix and axum remains at 3.5GB to 4.5GB, and this memory never decreases automatically. Unless the memory allocator is changed to mi_malloc, in which case the memory drops to 0.7GB to 1.5GB after concurrency ends. Many rust web frameworks' memory leak issues are like this – the memory does not contract after high concurrency without changing the memory allocator.

In the issue sections for tokio, actix, axum, and hyper, when encountering memory leak issues, community developers and contributors often say that not returning memory to the system improves performance for future memory usage. However, there is no evidence to support that releasing memory significantly impacts server performance. Based on my multiple rounds of testing, whether starting up or using mi_malloc, regardless of the size of the memory, the response times for clients during the next round of million-level concurrency are similar. This means that the server-side handling performance remains consistent, making it hard to prove the claims that not releasing memory significantly improves the performance of future memory allocations.

For the server-side, if an attacker targets a slow web API and initiates a massive concurrent attack, even if the server process does not crash, the consumed memory will not automatically disappear. Only by restarting the process can the memory be reduced to normal levels. For example, if your business QPS is usually a few hundred to a few thousand, and regular memory usage is only 100 MB, after an attack, the memory might become 4.5GB that cannot contract. This situation is unfavorable for operations management, as monitoring data should be as accurate as possible. The behavior of tokio-based web frameworks in this regard is quite troubling.

However, there are some workarounds. For instance, when using axum or hyper, I tried declaring a global RUNTIME and then spawning coroutines to handle requests using RUNTIME.spawn(async move {...}) during loop-accept. When refreshing is needed, I use std::mem::replace in a lock-free manner to replace the global RUNTIME. The replaced runtime waits asynchronously for a period before shutting down with rt.shutdown_background(). Through this method, the process memory of axum, which was at 4.5GB, can be reduced to 0.5GB after several refreshes. If using mi_malloc, the memory can shrink to as low as 20MB, which is a good result with almost no loss in performance.

However, this approach has its drawbacks. If the service includes long-lived connections like HTTP/2, SSE, or WebSockets, safely shutting down the server becomes challenging. Sometimes, even if rt.metrics().num_alive_tasks() shows zero active tasks, it doesn't necessarily mean the runtime can be safely shut down. It might be due to existing TCP connections where the client is still reading data from the TCP buffer. If the TCP connection is terminated at this point, the client's buffered data would be discarded, causing the client to fail to receive the data. Also, this method only works for axum and hyper, and is not suitable for many other web frameworks.

From these tests, it seems that tokio::runtime::Runtime indeed has an issue with not releasing memory, though this cannot strictly be defined as a memory leak. Similar to basic data structures like VecQueue, LinkedList, and HashMap, after inserting a large number of elements, even if all elements are removed, the process memory does not immediately return to the system, unless the variable holding the data structure is destroyed. In most cases, destroying the container variable results in the process memory being released normally.

To address the title, I hope that tokio could provide a safe way to refresh the runtime. I am not familiar with all the source code of the runtime, but I suspect that there may be a situation where containers are expanded but the memory is not immediately returned to the system. If there were a method to manually refresh all containers in the runtime – create new containers, transfer elements from old containers to the new ones, and then safely destroy the old containers, it might help mitigate the problem of memory not being released. Ideally, this process should be lock-free so that it doesn't impact the spawning of new tasks during the refresh.

It's important to note that during testing, you shouldn't merely use tokio::spawn + tokio::time::sleep for concurrency testing. For simple memory structures, the system can sometimes recover memory, but when dealing with complex coroutines in hyper or axum, you can't guarantee memory contraction. However, overall destruction and recreation of the runtime always ensures memory contraction. Additionally, the code for million-level concurrency with HTTP/2 is not complex. The only thing to note is to increase the max_concurrent_streams parameter in hyper (I set it to 1,000,000).

In summary, for the issue of memory release, perhaps only tokio can solve it.

Additionally, I wrote an article about testing rust web frameworks in Chinese, which can be translated for reading: rust的web框架单机百万并发的性能与开销

nldxtd commented 3 months ago

I have seen issues in other framework reporting about the high memory usage, i guess it might help a lot if this issue solved

nldxtd commented 3 months ago

also, can you share the scripts you used in testing tokio & other framework?

Darksonn commented 3 months ago

but I suspect that there may be a situation where containers are expanded but the memory is not immediately returned to the system. If there were a method to manually refresh all containers in the runtime – create new containers, transfer elements from old containers to the new ones, and then safely destroy the old containers, it might help mitigate the problem of memory not being released.

I'm not sure which containers that would be. For example, with the multi-thread runtime, each worker thread has a fixed size buffer whose size never changes from 256. And for the global queue, tasks are stored linked lists and so the memory is freed when the task is done.

Please verify that the memory actually stays high when measured using this utility:

use core::sync::atomic::{AtomicUsize, Ordering::Relaxed};
use std::alloc::{GlobalAlloc, Layout, System};

struct TrackedAlloc {}

#[global_allocator]
static ALLOC: TrackedAlloc = TrackedAlloc;

static TOTAL_MEM: AtomicUsize = AtomicUsize::new(0);

unsafe impl GlobalAlloc for TrackedAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let ret = System.alloc(layout);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(layout.size(), Relaxed);
        }
        ret
    }
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        TOTAL_MEM.fetch_sub(layout.size(), Relaxed);
        System.dealloc(ptr, layout);
    }

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
        let ret = System.alloc_zeroed(layout);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(layout.size(), Relaxed);
        }
        ret
    }
    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
        let ret = System.realloc(ptr, layout, new_size);
        if !ret.is_null() {
            TOTAL_MEM.fetch_add(new_size.wrapping_sub(layout.size()), Relaxed);
        }
        ret
    }
}

If it doesn't, then this is entirely up to the allocator and not something Tokio really has any power over.

lithbitren commented 3 months ago

@nldxtd @Darksonn


After tidying up the code, I primarily tested memory usage under four conditions. For each state, 4 to 5 seconds after startup, a million concurrent HTTP/2 requests were made, and the chart reflects the status of the first two rounds of testing.

In these charts, the top-left corner shows the default allocator without refreshing, the top-right corner displays mimalloc without refreshing, the bottom-left corner is the default allocator with refreshing, and the bottom-right corner is mimalloc with refreshing.

The horizontal axis represents server startup time in seconds.

The vertical axis on the left side indicates memory size, applicable to both inner_mem and outer_mem data, which represent the size of process memory tracked internally and externally, respectively, measured in Mb. The vertical axis on the right side shows the number of handle connections and concurrent tasks, with a maximum connection count of one million and a maximum task count of 1,000,008.

Internal memory tracking was implemented by modifying the global memory allocator, while external monitoring of process memory was achieved using the third-party library sysinfo. The values for external monitoring of process memory can be unstable but generally align with those from the top command or the graphical task manager in Linux.

axum-test

From these charts, it's evident that when conducting a million concurrent tests, as real-time tasks increase, memory usage spikes sharply, reaching around 2,500 MB at its peak.

The internal memory monitoring always clears automatically when tasks decrease, indicating there is no memory leak in the code.

However, when using the default allocator, even when tasks decrease, the external memory monitoring continues to rise, only releasing when the number of connections drops to zero. After release, the memory stabilizes at around 2,000 MB.

When using the mi_malloc allocator, the external memory monitoring releases memory as tasks are freed, but it doesn't reduce to a very low figure, instead leveling off at around 600 MB.

After concurrency ends, if the runtime is refreshed, it's noticeable that the external memory monitoring can shrink to a very low level.

With the default allocator, the external memory monitoring will contract once or twice and then stabilize, ending up between 10 and 100 MB.

Using the mi_malloc allocator, the external memory monitoring gradually contracts to around 15 MB, remaining stable.

From these test results, it's clear that refreshing Tokio's runtime does indeed have a meaningful impact on the release of externally monitored memory.


The client is a high-concurrency client implemented with reqwest, initialized with 8 HTTP/2 clients since the CPU has 8 threads. You can adjust this number when copying the code for testing. It can set multiple rounds of a million concurrent requests, with a default rest period of 10 seconds between each round (to observe memory shrinkage on the server side). My client's working memory is about 7-8 GB, and its idle memory is around 4 GB.

Although the client code is presented first, you would typically start the server before launching the client.

cargo.toml:

[dependencies]
reqwest = "0.12.5"
tokio = { version = "1.39.2", features = ["full"] }

src/main.rs:

use std::time::{Duration, Instant};

/// Get the URL for the request; for multiple target servers, implement address rotation here.
fn get_url(round: usize) -> &'static str {
    "http://0.0.0.0:3000?server=axum&expire=20000"
}

/// Number of rounds
const TOTAL_ROUNDS: usize = 300;
/// Requests per round
const TOTAL_REQUESTS: usize = 1_000_000;
/// Waiting period between rounds, in seconds
const WAITING_SECS: u64 = 10;
/// Number of clients
const NUM_CLIENTS: usize = 8;

/// Countdown output during waiting periods
async fn countdown(seconds: u64) {
    let mut interval = tokio::time::interval(Duration::from_secs(1));
    for i in 0..=seconds {
        interval.tick().await;
        print!("\rwaiting for ({}/{}) seconds...", i, seconds);
        use std::io::Write;
        std::io::stdout().flush().unwrap();
    }
    println!("\r{}\r", " ".repeat(30));
}

/// Calculate percentile times for responses
fn calculate_percentiles(durations: Vec<Duration>) -> [Duration; 5] {
    let mut sorted_durations = durations;
    sorted_durations.sort_by_key(|d| d.as_millis());
    let total_requests = sorted_durations.len();
    [
        sorted_durations[total_requests / 2],
        sorted_durations[total_requests * 9 / 10],
        sorted_durations[total_requests * 99 / 100],
        sorted_durations[total_requests * 999 / 1000],
        sorted_durations.last().cloned().unwrap_or_default(),
    ]
}

#[tokio::main]
async fn main() {
    let start = Instant::now();
    let clients: Vec<reqwest::Client> = (0..NUM_CLIENTS)
        .map(|_| {
            reqwest::ClientBuilder::new()
                .pool_idle_timeout(Duration::from_secs(1))
                .http2_prior_knowledge() // mark as an HTTP/2 client
                .build()
                .unwrap()
        })
        .collect();
    println!("build-clients-time: {:.3?}", start.elapsed());

    for round in 1..=TOTAL_ROUNDS {
        multi_requests(round, &clients).await;
        countdown(WAITING_SECS).await;
    }
}

/// Concurrent request function per round
async fn multi_requests(round: usize, clients: &Vec<reqwest::Client>) {
    let runtime = tokio::runtime::Runtime::new().unwrap();

    let mut results = std::collections::HashMap::new();
    let mut successful_times = Vec::new();
    let mut successful_count = 0u32;
    let clients_len = clients.len();

    let start = Instant::now();
    let mut tasks = Vec::with_capacity(TOTAL_REQUESTS);
    for i in 0..TOTAL_REQUESTS {
        let client = clients[i % clients_len].clone();
        tasks.push(runtime.spawn(async move {
            let url = get_url(round);
            let request_start = Instant::now();
            let response = client
                .get(url)
                .timeout(Duration::from_secs(60))
                .send()
                .await;
            let result = match response {
                Ok(r) => r.text().await,
                Err(e) => Err(e),
            };
            (result, request_start.elapsed())
        }));
    }
    println!("{round} - prepared-time: {:.3?}", start.elapsed());

    // Convert all responses and error types to strings for statistics
    for task in tasks {
        let (result, elapsed) = task.await.unwrap();
        if result.is_ok() {
            successful_times.push(elapsed);
            successful_count += 1;
        }
        let result = result.unwrap_or_else(|e| e.to_string());
        *results.entry(result).or_insert(0) += 1;
    }
    println!("{round} - completed-time: {:.3?}", start.elapsed());

    // Output specific data if there are successful responses
    if successful_count > 0 {
        println!(
            "{round} - successful-count: {successful_count}, average-time: {:.3?}",
            successful_times.iter().sum::<Duration>() / (successful_count as u32)
        );
        let percentiles = calculate_percentiles(successful_times);
        println!(
            "{round} - percentiles: {{ 50%: {:.3?}, 90%: {:.3?}, 99%: {:.3?}, 99.9%: {:.3?}, 100%: {:.3?} }}",
            percentiles[0], percentiles[1], percentiles[2], percentiles[3], percentiles[4]
        );
    }

    // Calculate the success rate of responses
    println!(
        "{round} - results-len: {}, success-rate: {:.2}%",
        results.len(),
        100. * successful_count as f64 / TOTAL_REQUESTS as f64
    );

    // Do not output the results if there are too many.
    if results.len() <= 100 {
        println!("{round} - results: {:#?}", results);
    }

    runtime.shutdown_background();
}

Typical output data:

build-clients-time: 61.264ms
1 - prepared-time: 2.401s
1 - completed-time: 26.181s
1 - successful-count: 1000000, average-time: 18.341s
1 - percentiles: {50%: 18.743s, 90%: 21.764s, 99%: 22.663s, 99.9%: 23.372s, 100%: 24.031s}
1 - results-len: 8, success-rate: 100.00%
1 - results: {
    "axum(20027) HTTP/2.0 192.168.0.123:11582": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11579": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11581": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11578": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11583": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11584": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11580": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11585": 125000,
}

2 - prepared-time: 2.393s
2 - completed-time: 25.683s
2 - successful-count: 1000000, average-time: 18.667s
2 - percentiles: {50%: 18.503s, 90%: 22.145s, 99%: 22.761s, 99.9%: 23.732s, 100%: 24.385s}
2 - results-len: 8, success-rate: 100.00%
2 - results: {
    "axum(20027) HTTP/2.0 192.168.0.123:11591": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11586": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11587": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11593": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11592": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11590": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11589": 125000,
    "axum(20027) HTTP/2.0 192.168.0.123:11588": 125000,
}
waiting for (3/10) seconds...
57 - prepared-time: 2.181s
57 - completed-time: 67.254s
57 - successful-count: 606228, average-time: 17.133s
57 - percentiles: { 50%: 16.718s, 90%: 18.246s, 99%: 19.710s, 99.9%: 59.925s, 100%: 60.001s }
57 - results-len: 10, success-rate: 60.62%
57 - results: {
    "axum(13053) HTTP/2.0 192.168.0.123:7806": 124806,
    "axum(13053) HTTP/2.0 192.168.0.123:7808": 58174,
    "axum(13053) HTTP/2.0 192.168.0.123:7805": 8790,
    "axum(13053) HTTP/2.0 192.168.0.123:7811": 64255,
    "error decoding response body": 86967,
    "axum(13053) HTTP/2.0 192.168.0.123:7807": 116657,
    "axum(13053) HTTP/2.0 192.168.0.123:7810": 122746,
    "axum(13053) HTTP/2.0 192.168.0.123:7812": 107818,
    "error sending request for url (http://192.168.0.127:3000/?server=axum&expire=15000)": 306805,
    "axum(13053) HTTP/2.0 192.168.0.123:7809": 2982,
}

Correct responses include: server type identifier, server process ID, HTTP type, and the client's IP address. There are a total of one million requests, eight clients, with each client receiving 125,000 requests.

If you press Ctrl+C to close the server during the requests, especially when the server connections are starting to release, you can observe error outputs, typically only sending and decoding errors.


The server uses axum and hyper in low-level mode. The code is lengthy and includes memory monitoring (tracking both the allocator's internal memory and the external process memory), global runtime monitoring and replacement (using parking_lot for locking modifications, as data contention is minimal, locking operations do not significantly affect performance), handle design to dynamically sleep based on the expire value extracted from params, and a separate coroutine to output these states in the command line.

When investigating memory leaks in Rust, we usually monitor the memory state through the internal memory allocator (referred to below as INNER_MEMORY). However, during operations management, we typically monitor the external memory (referred to below as OUTER_MEMORY). These two memories are not the same; stable internal memory usually indicates no memory leaks, while external memory significantly larger than internal memory often means the process memory is not promptly returned to the system.

mi_malloc is currently mainly compatible with Linux, so you may need to comment out mi_malloc-related code for other systems, but this does not affect observing whether process memory is released.

You can switch between memory allocators and whether to refresh Tokio's runtime by toggling comments. After switching comments, you need to recompile incrementally.

static REAL_ALLOCATOR: (std::alloc::System, &str, u64) = (std::alloc::System, "DefaultAlloc", 20);
// static REAL_ALLOCATOR: (mimalloc::MiMalloc, &str, u64) = (mimalloc::MiMalloc, "MiMalloc", 20);

// const AUTO_REFRESH: bool = false;
const AUTO_REFRESH: bool = true;

cargo.toml:

[dependencies]
axum =  { version = "0.7.5", features = ["http2"] }
tokio = { version = "1.39.2", features = ["full"] }
mimalloc = "0.1.42"
hyper = "1.4.1" 
hyper-util = { version = "0.1.7", features = ["tokio", "server-auto", "http2"] }
tower = { version = "0.4.13", features = ["full"] }
sysinfo = "0.30.13"
parking_lot = "0.12.3"

src/main.rs:

use std::alloc::{GlobalAlloc, Layout};
use std::collections::HashMap;
use std::net::SocketAddr;
use std::sync::{Arc, LazyLock};
use std::sync::atomic::{AtomicUsize, Ordering};
use std::time::Duration;

use sysinfo::{Pid, System};

use tokio::runtime::{Handle, Runtime};

use hyper::body::Incoming;
use hyper_util::rt::{TokioExecutor, TokioIo};
use hyper_util::server;
use tower::Service;

use axum::extract::{ConnectInfo, Query, Request, State};
use axum::{Extension, Router, routing::get};

/// Define the global runtime
use parking_lot::RwLock;
static GLOBAL_RUNTIME: RwLock<Option<Runtime>> = RwLock::new(None);

/// Process ID
static PID: LazyLock<Pid> = LazyLock::new(|| Pid::from_u32(std::process::id()));

/// Monitor output frequency in the command line, default is once every second
const INTERVAL_SECS: u64 = 1;

/// Custom memory allocator tracker
struct TrackedAlloc;

#[global_allocator]
static ALLOC: TrackedAlloc = TrackedAlloc;

/// Internal memory tracking
static INNER_MEMORY: AtomicUsize = AtomicUsize::new(0);

/// The actual allocator, the first option is the default allocator, the second is MiMalloc. Only one allocator can be used in the same code, so they must be selectively compiled via comments.
/// The tuple contains three elements: the allocator, the allocator name, and the threshold for memory contraction (i.e., the memory value above which the Tokio runtime is refreshed, in MB).
static REAL_ALLOCATOR: (std::alloc::System, &str, u64) = (std::alloc::System, "DefaultAlloc", 20);
// static REAL_ALLOCATOR: (mimalloc::MiMalloc, &str, u64) = (mimalloc::MiMalloc, "MiMalloc", 20);

/// Whether to automatically refresh Tokio's runtime
// const AUTO_REFRESH: bool = false;
const AUTO_REFRESH: bool = true;

/// Implementation of the internal memory tracking
unsafe impl GlobalAlloc for TrackedAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let ret = REAL_ALLOCATOR.0.alloc(layout);
        if !ret.is_null() {
            INNER_MEMORY.fetch_add(layout.size(), Ordering::Relaxed);
        }
        ret
    }
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        REAL_ALLOCATOR.0.dealloc(ptr, layout);
        INNER_MEMORY.fetch_sub(layout.size(), Ordering::Relaxed);
    }

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 {
        let ret = REAL_ALLOCATOR.0.alloc_zeroed(layout);
        if !ret.is_null() {
            INNER_MEMORY.fetch_add(layout.size(), Ordering::Relaxed);
        }
        ret
    }
    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
        let ret = REAL_ALLOCATOR.0.realloc(ptr, layout, new_size);
        if !ret.is_null() {
            INNER_MEMORY.fetch_add(new_size.wrapping_sub(layout.size()), Ordering::Relaxed);
        }
        ret
    }
}

/// Handle state tracking
#[derive(Debug, Default)]
struct MyState {
    counter: AtomicUsize,
    connection: AtomicUsize,
    max_connection: AtomicUsize,
}

#[tokio::main]
async fn main() {
    // Initialize or replace the global runtime with a write lock
    GLOBAL_RUNTIME.write().replace(Runtime::new().unwrap());

    println!("axum-demo pid is {}", *PID);

    // Bind to the port
    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
    println!("listening on {}", listener.local_addr().unwrap());

    // Initialize the handle state
    let my_status: Arc<MyState> = Default::default();

    // Start the asynchronous monitor using the main runtime
    tokio::spawn(monitor(my_status.clone()));

    // Create the axum service
    let app = Router::new().route("/", get(index)).with_state(my_status);

    // Loop to listen on the port in low-level form
    while let Ok((stream, addr)) = listener.accept().await {
        let app = app.clone();
        // Spawn using a read lock to get the global runtime
        GLOBAL_RUNTIME.read().as_ref().unwrap().spawn(async move {
            let io = TokioIo::new(stream);
            let hyper_service = hyper::service::service_fn(move |req: Request<Incoming>| {
                // Inject the address information of each connection into the handle
                app.clone().layer(Extension(ConnectInfo(addr))).call(req)
            });

            if let Err(err) = server::conn::auto::Builder::new(TokioExecutor::new())
                .http2() // Set the HTTP/2 service flag
                .max_concurrent_streams(1000000) // Maximum concurrent streams per HTTP/2 connection
                .keep_alive_timeout(Duration::from_secs(1))
                .serve_connection_with_upgrades(io, hyper_service)
                .await
            {
                println!("error serving connection from {addr:?}: {err:#}");
            }
        });
    }
}

/// Handle, sleeps based on the 'expire' value extracted from params, and updates the connection and counter state of `my_status`
async fn index(
    version: axum::http::Version,
    ConnectInfo(addr): ConnectInfo<SocketAddr>,
    State(my_status): State<Arc<MyState>>,
    Query(params): Query<HashMap<String, String>>,
) -> String {
    my_status.counter.fetch_add(1, Ordering::Relaxed);
    let conn = 1 + my_status.connection.fetch_add(1, Ordering::Relaxed);
    if conn > my_status.max_connection.load(Ordering::Relaxed) {
        my_status.max_connection.store(conn, Ordering::Relaxed);
    }
    if let Some(Ok(expire)) = params.get("expire").map(|s| s.parse::<u64>()) {
        tokio::time::sleep(Duration::from_millis(expire)).await;
    }
    my_status.connection.fetch_sub(1, Ordering::Relaxed);
    format!("axum({}) {version:?} {addr:?}", *PID)
}

/// Obtain process memory from the outside using the sysinfo third-party library
fn get_memory_usage() -> u64 {
    let system_info = System::new_all();
    let current_process = system_info.process(*PID).expect("Process not found");
    current_process.memory()
}

/// Asynchronous monitor function
async fn monitor(my_status: Arc<MyState>) {
    let mut interval = tokio::time::interval(Duration::from_secs(INTERVAL_SECS));
    let mut tick = 0;
    loop {
        interval.tick().await;
        let ct = my_status.counter.load(Ordering::Relaxed);
        let conn = my_status.connection.load(Ordering::Relaxed);
        let max_conn = my_status.max_connection.load(Ordering::Relaxed);
        let outer_memory = get_memory_usage();
        let global_spawn = GLOBAL_RUNTIME
            .read()
            .as_ref()
            .unwrap()
            .handle()
            .metrics()
            .num_alive_tasks();
        let main_spawn = Handle::current().metrics().num_alive_tasks();
        // Refresh the global runtime when there are no tasks in the global runtime and the external memory exceeds the threshold
        let refresh_message =
            if AUTO_REFRESH && global_spawn == 0 && outer_memory > REAL_ALLOCATOR.2 * 1024 * 1024 {
                let start = std::time::Instant::now();
                // Replace the old global runtime
                let old_global_runtime = GLOBAL_RUNTIME.write().replace(Runtime::new().unwrap());
                // Shutdown the old global runtime in the main runtime
                old_global_runtime.unwrap().shutdown_background();
                format!(
                    ", refresh runtime on {} until {}M! cost {:?}",
                    REAL_ALLOCATOR.1,
                    REAL_ALLOCATOR.2,
                    start.elapsed()
                )
            } else {
                "".to_string()
            };
        // Output the handle counter, real-time connection count, maximum connection count, internal memory, external memory, main runtime running tasks, global runtime running tasks, and refresh information
        println!(
            "axum {tick}, ct: {ct}, conn: {conn}, max_conn: {max_conn}, inner_mem: {:.3}M, outer_mem: {:.3}M, main_rt: {}, global_rt: {}{}",
            INNER_MEMORY.load(Ordering::Relaxed) as f64  / 1024. / 1024.,
            outer_memory as f64 / 1024. / 1024.,
            main_spawn,
            global_spawn,
            refresh_message,
        );
        tick += INTERVAL_SECS;
    }
}

This is not a good practice for refreshing runtimes. Refreshing the global runtime requires that the number of active tasks reaches zero, meaning that the refresh must happen outside the runtime itself.

Moreover, reaching zero active tasks requires waiting, such as waiting for all long-lived connections to end before entering the refresh process.

Even when the number of active tasks reaches zero, it does not guarantee that the global runtime can be safely shut down. A zero active task count only indicates that the server's asynchronous state has ended, but TCP connections may not be fully closed. Shutting down the global runtime at this point might close the TCP connections, potentially leading to the loss of data in the client's TCP buffer.

Currently, this refresh method is suitable only for axum/hyper; other frameworks depending on Tokio cannot safely use it.

I am unsure whether refreshing Tokio's runtime thread and its internal containers allows the system to fully reclaim memory. This would require further detailed testing by someone familiar with the Tokio runtime code. However, overall, replacing the runtime does help the system reclaim memory.


The original data of server logs in the chart is as follows:

1、default-without-refresh

axum-demo pid is 2133
listening on 0.0.0.0:3000
axum 0, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 6.125M, main_rt: 1, global_rt: 0
axum 1, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 6.508M, main_rt: 1, global_rt: 0
axum 2, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 6.527M, main_rt: 1, global_rt: 0
axum 3, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 6.684M, main_rt: 1, global_rt: 0
axum 4, ct: 27760, conn: 27760, max_conn: 27760, inner_mem: 83.410M, outer_mem: 87.824M, main_rt: 1, global_rt: 31690
axum 5, ct: 237193, conn: 237193, max_conn: 237193, inner_mem: 643.367M, outer_mem: 636.449M, main_rt: 1, global_rt: 243295
axum 6, ct: 409585, conn: 409585, max_conn: 409585, inner_mem: 969.766M, outer_mem: 1003.730M, main_rt: 1, global_rt: 413277
axum 7, ct: 588048, conn: 588048, max_conn: 588048, inner_mem: 1609.142M, outer_mem: 1668.973M, main_rt: 1, global_rt: 690160
axum 8, ct: 860580, conn: 860580, max_conn: 860580, inner_mem: 2113.805M, outer_mem: 2263.973M, main_rt: 1, global_rt: 933078
axum 9, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 10, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 11, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 12, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 13, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 14, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 15, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 16, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 17, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 18, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2445.973M, main_rt: 1, global_rt: 1000008
axum 19, ct: 1000000, conn: 983001, max_conn: 1000000, inner_mem: 2221.630M, outer_mem: 2450.598M, main_rt: 1, global_rt: 982367
axum 20, ct: 1000000, conn: 795699, max_conn: 1000000, inner_mem: 1998.540M, outer_mem: 2496.223M, main_rt: 1, global_rt: 795714
axum 21, ct: 1000000, conn: 617085, max_conn: 1000000, inner_mem: 1834.297M, outer_mem: 2547.473M, main_rt: 1, global_rt: 614342
axum 22, ct: 1000000, conn: 490067, max_conn: 1000000, inner_mem: 1542.834M, outer_mem: 2557.598M, main_rt: 1, global_rt: 490082
axum 23, ct: 1000000, conn: 303630, max_conn: 1000000, inner_mem: 1334.821M, outer_mem: 2615.848M, main_rt: 1, global_rt: 303645
axum 24, ct: 1000000, conn: 138866, max_conn: 1000000, inner_mem: 1049.048M, outer_mem: 2634.598M, main_rt: 1, global_rt: 135432
axum 25, ct: 1000000, conn: 16145, max_conn: 1000000, inner_mem: 802.302M, outer_mem: 2649.473M, main_rt: 1, global_rt: 15351
axum 26, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 683.543M, outer_mem: 2638.582M, main_rt: 1, global_rt: 8
axum 27, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 683.543M, outer_mem: 2638.582M, main_rt: 1, global_rt: 8
axum 28, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 683.543M, outer_mem: 2638.582M, main_rt: 1, global_rt: 8
axum 29, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 30, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 31, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 32, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 33, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 34, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 35, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 36, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 37, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 38, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2062.879M, main_rt: 1, global_rt: 0
axum 39, ct: 1146947, conn: 146947, max_conn: 1000000, inner_mem: 375.161M, outer_mem: 2062.879M, main_rt: 1, global_rt: 150038
axum 40, ct: 1330594, conn: 330594, max_conn: 1000000, inner_mem: 1093.399M, outer_mem: 2174.879M, main_rt: 1, global_rt: 430367
axum 41, ct: 1646670, conn: 646670, max_conn: 1000000, inner_mem: 1597.347M, outer_mem: 2236.879M, main_rt: 1, global_rt: 652922
axum 42, ct: 1794766, conn: 794766, max_conn: 1000000, inner_mem: 2163.816M, outer_mem: 2465.754M, main_rt: 1, global_rt: 986139
axum 43, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 44, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 45, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 46, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 47, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 48, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 49, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 50, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 51, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 52, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 53, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.042M, outer_mem: 2532.879M, main_rt: 1, global_rt: 1000008
axum 54, ct: 2000000, conn: 873388, max_conn: 1000000, inner_mem: 2062.612M, outer_mem: 2553.004M, main_rt: 1, global_rt: 871597
axum 55, ct: 2000000, conn: 702935, max_conn: 1000000, inner_mem: 1772.213M, outer_mem: 2580.754M, main_rt: 1, global_rt: 701419
axum 56, ct: 2000000, conn: 481642, max_conn: 1000000, inner_mem: 1506.089M, outer_mem: 2630.254M, main_rt: 1, global_rt: 480889
axum 57, ct: 2000000, conn: 285573, max_conn: 1000000, inner_mem: 1309.204M, outer_mem: 2714.598M, main_rt: 1, global_rt: 282309
axum 58, ct: 2000000, conn: 131773, max_conn: 1000000, inner_mem: 1048.973M, outer_mem: 2729.223M, main_rt: 1, global_rt: 129783
axum 59, ct: 2000000, conn: 3418, max_conn: 1000000, inner_mem: 708.770M, outer_mem: 2736.723M, main_rt: 1, global_rt: 824
axum 60, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 627.293M, outer_mem: 2736.723M, main_rt: 1, global_rt: 8
axum 61, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 627.293M, outer_mem: 2736.723M, main_rt: 1, global_rt: 8
axum 62, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 627.293M, outer_mem: 2736.723M, main_rt: 1, global_rt: 8
axum 63, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2339.508M, main_rt: 1, global_rt: 0
axum 64, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2339.508M, main_rt: 1, global_rt: 0
axum 65, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2339.508M, main_rt: 1, global_rt: 0
axum 66, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2339.508M, main_rt: 1, global_rt: 0
axum 67, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2339.508M, main_rt: 1, global_rt: 0
axum 68, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2339.508M, main_rt: 1, global_rt: 0
axum 69, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2339.508M, main_rt: 1, global_rt: 0
axum 70, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2339.508M, main_rt: 1, global_rt: 0

2、mimalloc-without-refresh

axum-demo pid is 2259
listening on 0.0.0.0:3000
axum 0, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 7.875M, main_rt: 1, global_rt: 0
axum 1, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 8.480M, main_rt: 1, global_rt: 0
axum 2, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 9.355M, main_rt: 1, global_rt: 0
axum 3, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 10.105M, main_rt: 1, global_rt: 0
axum 4, ct: 2358, conn: 2358, max_conn: 2358, inner_mem: 5.898M, outer_mem: 19.230M, main_rt: 1, global_rt: 2366
axum 5, ct: 195623, conn: 195623, max_conn: 195623, inner_mem: 465.840M, outer_mem: 484.844M, main_rt: 1, global_rt: 195631
axum 6, ct: 320341, conn: 320341, max_conn: 320341, inner_mem: 762.931M, outer_mem: 788.125M, main_rt: 1, global_rt: 320349
axum 7, ct: 482341, conn: 482341, max_conn: 482341, inner_mem: 1152.570M, outer_mem: 1188.777M, main_rt: 1, global_rt: 504142
axum 8, ct: 757306, conn: 757306, max_conn: 757306, inner_mem: 1901.621M, outer_mem: 1954.539M, main_rt: 1, global_rt: 782888
axum 9, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.543M, main_rt: 1, global_rt: 1000008
axum 10, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.543M, main_rt: 1, global_rt: 1000008
axum 11, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.668M, main_rt: 1, global_rt: 1000008
axum 12, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.793M, main_rt: 1, global_rt: 1000008
axum 13, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.793M, main_rt: 1, global_rt: 1000008
axum 14, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.793M, main_rt: 1, global_rt: 1000008
axum 15, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.793M, main_rt: 1, global_rt: 1000008
axum 16, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.793M, main_rt: 1, global_rt: 1000008
axum 17, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.793M, main_rt: 1, global_rt: 1000008
axum 18, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2340.793M, main_rt: 1, global_rt: 1000008
axum 19, ct: 1000000, conn: 997642, max_conn: 1000000, inner_mem: 2238.318M, outer_mem: 2336.062M, main_rt: 1, global_rt: 997650
axum 20, ct: 1000000, conn: 835549, max_conn: 1000000, inner_mem: 2064.168M, outer_mem: 2231.395M, main_rt: 1, global_rt: 833324
axum 21, ct: 1000000, conn: 681860, max_conn: 1000000, inner_mem: 1759.974M, outer_mem: 1824.879M, main_rt: 1, global_rt: 680530
axum 22, ct: 1000000, conn: 536398, max_conn: 1000000, inner_mem: 1512.444M, outer_mem: 1585.848M, main_rt: 1, global_rt: 536015
axum 23, ct: 1000000, conn: 298206, max_conn: 1000000, inner_mem: 1164.689M, outer_mem: 1244.891M, main_rt: 1, global_rt: 285720
axum 24, ct: 1000000, conn: 79740, max_conn: 1000000, inner_mem: 912.242M, outer_mem: 966.766M, main_rt: 1, global_rt: 75767
axum 25, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 672.618M, outer_mem: 778.508M, main_rt: 1, global_rt: 8
axum 26, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 664.043M, outer_mem: 778.359M, main_rt: 1, global_rt: 8
axum 27, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 664.043M, outer_mem: 778.359M, main_rt: 1, global_rt: 8
axum 28, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 664.043M, outer_mem: 778.359M, main_rt: 1, global_rt: 8
axum 29, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.422M, main_rt: 1, global_rt: 0
axum 30, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.422M, main_rt: 1, global_rt: 0
axum 31, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.422M, main_rt: 1, global_rt: 0
axum 32, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.422M, main_rt: 1, global_rt: 0
axum 33, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.422M, main_rt: 1, global_rt: 0
axum 34, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.266M, main_rt: 1, global_rt: 0
axum 35, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.391M, main_rt: 1, global_rt: 0
axum 36, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.391M, main_rt: 1, global_rt: 0
axum 37, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.391M, main_rt: 1, global_rt: 0
axum 38, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 731.391M, main_rt: 1, global_rt: 0
axum 39, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.318M, outer_mem: 617.980M, main_rt: 1, global_rt: 2
axum 40, ct: 1174121, conn: 174121, max_conn: 1000000, inner_mem: 441.919M, outer_mem: 842.766M, main_rt: 1, global_rt: 178585
axum 41, ct: 1376203, conn: 376203, max_conn: 1000000, inner_mem: 943.529M, outer_mem: 945.648M, main_rt: 1, global_rt: 381107
axum 42, ct: 1536470, conn: 536470, max_conn: 1000000, inner_mem: 1250.351M, outer_mem: 1283.625M, main_rt: 1, global_rt: 544540
axum 43, ct: 1808908, conn: 808908, max_conn: 1000000, inner_mem: 1902.623M, outer_mem: 1964.914M, main_rt: 1, global_rt: 813646
axum 44, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2336.215M, main_rt: 1, global_rt: 1000008
axum 45, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.840M, main_rt: 1, global_rt: 1000008
axum 46, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.840M, main_rt: 1, global_rt: 1000008
axum 47, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.660M, main_rt: 1, global_rt: 1000008
axum 48, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.660M, main_rt: 1, global_rt: 1000008
axum 49, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.660M, main_rt: 1, global_rt: 1000008
axum 50, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.660M, main_rt: 1, global_rt: 1000008
axum 51, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.660M, main_rt: 1, global_rt: 1000008
axum 52, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.660M, main_rt: 1, global_rt: 1000008
axum 53, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.660M, main_rt: 1, global_rt: 1000008
axum 54, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2335.660M, main_rt: 1, global_rt: 1000008
axum 55, ct: 2000000, conn: 836494, max_conn: 1000000, inner_mem: 1999.376M, outer_mem: 2119.828M, main_rt: 1, global_rt: 832779
axum 56, ct: 2000000, conn: 651632, max_conn: 1000000, inner_mem: 1695.710M, outer_mem: 1820.117M, main_rt: 1, global_rt: 648517
axum 57, ct: 2000000, conn: 485491, max_conn: 1000000, inner_mem: 1389.168M, outer_mem: 1500.973M, main_rt: 1, global_rt: 484290
axum 58, ct: 2000000, conn: 245103, max_conn: 1000000, inner_mem: 1016.947M, outer_mem: 1136.078M, main_rt: 1, global_rt: 234059
axum 59, ct: 2000000, conn: 46343, max_conn: 1000000, inner_mem: 776.429M, outer_mem: 893.582M, main_rt: 1, global_rt: 44420
axum 60, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 601.793M, outer_mem: 668.934M, main_rt: 1, global_rt: 8
axum 61, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 601.793M, outer_mem: 668.809M, main_rt: 1, global_rt: 8
axum 62, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 601.793M, outer_mem: 668.684M, main_rt: 1, global_rt: 8
axum 63, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 601.793M, outer_mem: 668.684M, main_rt: 1, global_rt: 8
axum 64, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 656.016M, main_rt: 1, global_rt: 0
axum 65, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 656.016M, main_rt: 1, global_rt: 0
axum 66, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 655.871M, main_rt: 1, global_rt: 0
axum 67, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 655.871M, main_rt: 1, global_rt: 0
axum 68, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 655.871M, main_rt: 1, global_rt: 0
axum 69, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 655.871M, main_rt: 1, global_rt: 0
axum 70, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 655.871M, main_rt: 1, global_rt: 0

3、default-with-refresh

axum-demo pid is 2849
listening on 0.0.0.0:3000
axum 0, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 6.250M, main_rt: 1, global_rt: 0
axum 1, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 6.559M, main_rt: 1, global_rt: 0
axum 2, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 6.777M, main_rt: 1, global_rt: 0
axum 3, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 6.805M, main_rt: 1, global_rt: 0
axum 4, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.262M, outer_mem: 6.824M, main_rt: 1, global_rt: 2
axum 5, ct: 67033, conn: 67033, max_conn: 67033, inner_mem: 185.561M, outer_mem: 189.320M, main_rt: 1, global_rt: 68360
axum 6, ct: 211195, conn: 211195, max_conn: 211195, inner_mem: 514.785M, outer_mem: 540.012M, main_rt: 1, global_rt: 214611
axum 7, ct: 362115, conn: 362115, max_conn: 362115, inner_mem: 865.917M, outer_mem: 899.109M, main_rt: 1, global_rt: 369238
axum 8, ct: 571405, conn: 571405, max_conn: 571405, inner_mem: 1492.590M, outer_mem: 1567.328M, main_rt: 1, global_rt: 641803
axum 9, ct: 823134, conn: 823134, max_conn: 823134, inner_mem: 2045.258M, outer_mem: 2186.203M, main_rt: 1, global_rt: 884145
axum 10, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.703M, main_rt: 1, global_rt: 1000008
axum 11, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.703M, main_rt: 1, global_rt: 1000008
axum 12, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.703M, main_rt: 1, global_rt: 1000008
axum 13, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.828M, main_rt: 1, global_rt: 1000008
axum 14, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.828M, main_rt: 1, global_rt: 1000008
axum 15, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.828M, main_rt: 1, global_rt: 1000008
axum 16, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.828M, main_rt: 1, global_rt: 1000008
axum 17, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.828M, main_rt: 1, global_rt: 1000008
axum 18, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.828M, main_rt: 1, global_rt: 1000008
axum 19, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2444.828M, main_rt: 1, global_rt: 1000008
axum 20, ct: 1000000, conn: 940579, max_conn: 1000000, inner_mem: 2156.890M, outer_mem: 2456.828M, main_rt: 1, global_rt: 940594
axum 21, ct: 1000000, conn: 805314, max_conn: 1000000, inner_mem: 1923.350M, outer_mem: 2476.078M, main_rt: 1, global_rt: 803973
axum 22, ct: 1000000, conn: 654103, max_conn: 1000000, inner_mem: 1661.931M, outer_mem: 2480.578M, main_rt: 1, global_rt: 650122
axum 23, ct: 1000000, conn: 448379, max_conn: 1000000, inner_mem: 1349.720M, outer_mem: 2480.578M, main_rt: 1, global_rt: 446867
axum 24, ct: 1000000, conn: 264076, max_conn: 1000000, inner_mem: 1095.251M, outer_mem: 2493.828M, main_rt: 1, global_rt: 259210
axum 25, ct: 1000000, conn: 125734, max_conn: 1000000, inner_mem: 873.939M, outer_mem: 2532.328M, main_rt: 1, global_rt: 125749
axum 26, ct: 1000000, conn: 4260, max_conn: 1000000, inner_mem: 629.933M, outer_mem: 2554.203M, main_rt: 1, global_rt: 4275
axum 27, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 586.043M, outer_mem: 2455.984M, main_rt: 1, global_rt: 8
axum 28, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 586.043M, outer_mem: 2455.984M, main_rt: 1, global_rt: 8
axum 29, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 586.043M, outer_mem: 2455.984M, main_rt: 1, global_rt: 8
axum 30, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.347M, outer_mem: 1962.844M, main_rt: 1, global_rt: 0, refresh runtime on DefaultAlloc until 20M! cost 1.022712ms
axum 31, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 11.980M, main_rt: 1, global_rt: 0
axum 32, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 11.980M, main_rt: 1, global_rt: 0
axum 33, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 11.980M, main_rt: 1, global_rt: 0
axum 34, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 11.980M, main_rt: 1, global_rt: 0
axum 35, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 11.980M, main_rt: 1, global_rt: 0
axum 36, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 11.980M, main_rt: 1, global_rt: 0
axum 37, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 11.980M, main_rt: 1, global_rt: 0
axum 38, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 11.980M, main_rt: 1, global_rt: 0
axum 39, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 11.980M, main_rt: 1, global_rt: 0
axum 40, ct: 1121303, conn: 121303, max_conn: 1000000, inner_mem: 319.137M, outer_mem: 336.855M, main_rt: 1, global_rt: 125834
axum 41, ct: 1309984, conn: 309984, max_conn: 1000000, inner_mem: 823.280M, outer_mem: 852.605M, main_rt: 1, global_rt: 342849
axum 42, ct: 1559032, conn: 559032, max_conn: 1000000, inner_mem: 1360.795M, outer_mem: 1436.855M, main_rt: 1, global_rt: 574630
axum 43, ct: 1791651, conn: 791651, max_conn: 1000000, inner_mem: 1900.792M, outer_mem: 2052.355M, main_rt: 1, global_rt: 841903
axum 44, ct: 1977056, conn: 977056, max_conn: 1000000, inner_mem: 2193.794M, outer_mem: 2372.980M, main_rt: 1, global_rt: 978721
axum 45, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 46, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 47, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 48, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 49, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 50, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 51, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 52, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 53, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 54, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.049M, outer_mem: 2417.105M, main_rt: 1, global_rt: 1000008
axum 55, ct: 2000000, conn: 900346, max_conn: 1000000, inner_mem: 2096.700M, outer_mem: 2453.855M, main_rt: 1, global_rt: 897519
axum 56, ct: 2000000, conn: 711036, max_conn: 1000000, inner_mem: 1836.852M, outer_mem: 2502.855M, main_rt: 1, global_rt: 709887
axum 57, ct: 2000000, conn: 538257, max_conn: 1000000, inner_mem: 1587.075M, outer_mem: 2529.605M, main_rt: 1, global_rt: 535972
axum 58, ct: 2000000, conn: 343601, max_conn: 1000000, inner_mem: 1362.405M, outer_mem: 2566.566M, main_rt: 1, global_rt: 340818
axum 59, ct: 2000000, conn: 174574, max_conn: 1000000, inner_mem: 1187.495M, outer_mem: 2650.773M, main_rt: 1, global_rt: 170441
axum 60, ct: 2000000, conn: 16180, max_conn: 1000000, inner_mem: 1035.920M, outer_mem: 2719.148M, main_rt: 1, global_rt: 16195
axum 61, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 776.194M, outer_mem: 2727.398M, main_rt: 1, global_rt: 8
axum 62, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 754.051M, outer_mem: 2727.398M, main_rt: 1, global_rt: 8
axum 63, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 754.051M, outer_mem: 2727.398M, main_rt: 1, global_rt: 8
axum 64, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 754.051M, outer_mem: 2727.398M, main_rt: 1, global_rt: 8
axum 65, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 754.051M, outer_mem: 2727.398M, main_rt: 1, global_rt: 8
axum 66, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.345M, outer_mem: 2151.961M, main_rt: 1, global_rt: 0, refresh runtime on DefaultAlloc until 20M! cost 721.269µs
axum 67, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.344M, outer_mem: 62.145M, main_rt: 1, global_rt: 0, refresh runtime on DefaultAlloc until 20M! cost 823.072µs
axum 68, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 62.145M, main_rt: 1, global_rt: 0, refresh runtime on DefaultAlloc until 20M! cost 870.368µs
axum 69, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.344M, outer_mem: 62.145M, main_rt: 1, global_rt: 0, refresh runtime on DefaultAlloc until 20M! cost 915.286µs
axum 70, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.315M, outer_mem: 62.145M, main_rt: 1, global_rt: 0, refresh runtime on DefaultAlloc until 20M! cost 866.946µs

4、mimalloc-with-refresh

axum-demo pid is 3021
listening on 0.0.0.0:3000
axum 0, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 7.500M, main_rt: 1, global_rt: 0
axum 1, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 8.484M, main_rt: 1, global_rt: 0
axum 2, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 9.609M, main_rt: 1, global_rt: 0
axum 3, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 10.359M, main_rt: 1, global_rt: 0
axum 4, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 10.484M, main_rt: 1, global_rt: 0
axum 5, ct: 35355, conn: 35355, max_conn: 35355, inner_mem: 88.450M, outer_mem: 105.109M, main_rt: 1, global_rt: 37147
axum 6, ct: 138968, conn: 138968, max_conn: 138968, inner_mem: 329.454M, outer_mem: 350.250M, main_rt: 1, global_rt: 142594
axum 7, ct: 384539, conn: 384539, max_conn: 384539, inner_mem: 962.115M, outer_mem: 983.426M, main_rt: 1, global_rt: 393629
axum 8, ct: 606163, conn: 606163, max_conn: 606163, inner_mem: 1408.331M, outer_mem: 1450.270M, main_rt: 1, global_rt: 617051
axum 9, ct: 876509, conn: 876509, max_conn: 876509, inner_mem: 2030.256M, outer_mem: 2090.184M, main_rt: 1, global_rt: 881387
axum 10, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.586M, main_rt: 1, global_rt: 1000008
axum 11, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.836M, main_rt: 1, global_rt: 1000008
axum 12, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.836M, main_rt: 1, global_rt: 1000008
axum 13, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.836M, main_rt: 1, global_rt: 1000008
axum 14, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.652M, main_rt: 1, global_rt: 1000008
axum 15, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.652M, main_rt: 1, global_rt: 1000008
axum 16, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.652M, main_rt: 1, global_rt: 1000008
axum 17, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.652M, main_rt: 1, global_rt: 1000008
axum 18, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.652M, main_rt: 1, global_rt: 1000008
axum 19, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2235.041M, outer_mem: 2333.777M, main_rt: 1, global_rt: 1000008
axum 20, ct: 1000000, conn: 985848, max_conn: 1000000, inner_mem: 2217.086M, outer_mem: 2328.094M, main_rt: 1, global_rt: 977496
axum 21, ct: 1000000, conn: 862314, max_conn: 1000000, inner_mem: 2014.693M, outer_mem: 2106.859M, main_rt: 1, global_rt: 860119
axum 22, ct: 1000000, conn: 655363, max_conn: 1000000, inner_mem: 1707.808M, outer_mem: 1875.473M, main_rt: 1, global_rt: 655378
axum 23, ct: 1000000, conn: 459518, max_conn: 1000000, inner_mem: 1400.007M, outer_mem: 1536.480M, main_rt: 1, global_rt: 459533
axum 24, ct: 1000000, conn: 215690, max_conn: 1000000, inner_mem: 1082.380M, outer_mem: 1249.082M, main_rt: 1, global_rt: 214247
axum 25, ct: 1000000, conn: 33991, max_conn: 1000000, inner_mem: 846.364M, outer_mem: 948.512M, main_rt: 1, global_rt: 24484
axum 26, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 705.293M, outer_mem: 712.348M, main_rt: 1, global_rt: 8
axum 27, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 705.293M, outer_mem: 712.086M, main_rt: 1, global_rt: 8
axum 28, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 705.293M, outer_mem: 711.949M, main_rt: 1, global_rt: 8
axum 29, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 705.293M, outer_mem: 711.812M, main_rt: 1, global_rt: 8
axum 30, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.345M, outer_mem: 674.520M, main_rt: 1, global_rt: 0, refresh runtime on MiMalloc until 20M! cost 1.190306ms
axum 31, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.347M, outer_mem: 298.633M, main_rt: 1, global_rt: 0, refresh runtime on MiMalloc until 20M! cost 713.71µs
axum 32, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 13.512M, main_rt: 1, global_rt: 0
axum 33, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 14.387M, main_rt: 1, global_rt: 0
axum 34, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 14.387M, main_rt: 1, global_rt: 0
axum 35, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 14.262M, main_rt: 1, global_rt: 0
axum 36, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 14.387M, main_rt: 1, global_rt: 0
axum 37, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 14.387M, main_rt: 1, global_rt: 0
axum 38, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 14.387M, main_rt: 1, global_rt: 0
axum 39, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 14.387M, main_rt: 1, global_rt: 0
axum 40, ct: 1022144, conn: 22144, max_conn: 1000000, inner_mem: 58.355M, outer_mem: 75.668M, main_rt: 1, global_rt: 24549
axum 41, ct: 1243949, conn: 243949, max_conn: 1000000, inner_mem: 572.814M, outer_mem: 586.391M, main_rt: 1, global_rt: 245193
axum 42, ct: 1459547, conn: 459547, max_conn: 1000000, inner_mem: 1170.676M, outer_mem: 1156.102M, main_rt: 1, global_rt: 492212
axum 43, ct: 1688070, conn: 688070, max_conn: 1000000, inner_mem: 1612.515M, outer_mem: 1604.105M, main_rt: 1, global_rt: 688105
axum 44, ct: 1910744, conn: 910744, max_conn: 1000000, inner_mem: 2099.287M, outer_mem: 2171.328M, main_rt: 1, global_rt: 940210
axum 45, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2312.785M, main_rt: 1, global_rt: 1000008
axum 46, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2312.910M, main_rt: 1, global_rt: 1000008
axum 47, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2313.035M, main_rt: 1, global_rt: 1000008
axum 48, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2313.035M, main_rt: 1, global_rt: 1000008
axum 49, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2313.035M, main_rt: 1, global_rt: 1000008
axum 50, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2313.035M, main_rt: 1, global_rt: 1000008
axum 51, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2313.035M, main_rt: 1, global_rt: 1000008
axum 52, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2313.035M, main_rt: 1, global_rt: 1000008
axum 53, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2312.906M, main_rt: 1, global_rt: 1000008
axum 54, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2210.674M, outer_mem: 2312.906M, main_rt: 1, global_rt: 1000008
axum 55, ct: 2000000, conn: 986024, max_conn: 1000000, inner_mem: 2193.908M, outer_mem: 2309.086M, main_rt: 1, global_rt: 985245
axum 56, ct: 2000000, conn: 767038, max_conn: 1000000, inner_mem: 1835.938M, outer_mem: 1934.230M, main_rt: 1, global_rt: 764543
axum 57, ct: 2000000, conn: 563257, max_conn: 1000000, inner_mem: 1486.607M, outer_mem: 1598.703M, main_rt: 1, global_rt: 558215
axum 58, ct: 2000000, conn: 339905, max_conn: 1000000, inner_mem: 1156.538M, outer_mem: 1261.723M, main_rt: 1, global_rt: 338950
axum 59, ct: 2000000, conn: 132751, max_conn: 1000000, inner_mem: 773.141M, outer_mem: 852.016M, main_rt: 1, global_rt: 129134
axum 60, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 524.926M, outer_mem: 576.590M, main_rt: 1, global_rt: 8
axum 61, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 524.926M, outer_mem: 576.465M, main_rt: 1, global_rt: 8
axum 62, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 524.926M, outer_mem: 576.324M, main_rt: 1, global_rt: 8
axum 63, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 524.926M, outer_mem: 576.324M, main_rt: 1, global_rt: 8
axum 64, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 524.926M, outer_mem: 576.324M, main_rt: 1, global_rt: 8
axum 65, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.345M, outer_mem: 549.789M, main_rt: 1, global_rt: 0, refresh runtime on MiMalloc until 20M! cost 543.071µs
axum 66, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.347M, outer_mem: 432.996M, main_rt: 1, global_rt: 0, refresh runtime on MiMalloc until 20M! cost 980.94µs
axum 67, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.347M, outer_mem: 27.895M, main_rt: 1, global_rt: 0, refresh runtime on MiMalloc until 20M! cost 930.906µs
axum 68, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 13.184M, main_rt: 1, global_rt: 0
axum 69, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 13.809M, main_rt: 1, global_rt: 0
axum 70, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 13.934M, main_rt: 1, global_rt: 0
TinusgragLin commented 3 months ago

Hi, @lithbitren, not sure if it would help, but I came across libc's malloc_trim a few month ago when I was trying to reduce the memory usage of my server (using default/system allocator) when idle, it would give back free pages at the top of the heap to the system and did work pretty well for my use case.

lixiang365 commented 3 months ago

上午看到这个问题,做了下测试,free 后会将空闲的内存合并成大的内存块,并一定会立马还给操作系统。(这块具体可以看 free 源码实现),malloc_trim 就是将空闲的块都还回去。

unsafe {
    libc::malloc_trim(0);
}
lixiang365 commented 3 months ago

上午看到这个问题,做了下测试,free 后会将空闲的内存合并成大的内存块,并一定会立马还给操作系统。(这块具体可以看 free 源码实现),malloc_trim 就是将空闲的块都还回去。

unsafe {
  libc::malloc_trim(0);
}

我觉得刷新运行时和自己手动malloc_trim并不是好的解决方法,更换内存分配器更合适。

lithbitren commented 3 months ago

@TinusgragLin @lixiang365


我除了上学的时候用过C语言刷题,之后就再没有用过C语言了,对C语言的系统级编程不是很了解。 我看了一些C语言关于malloc_trim的文章,似乎有不少文章不推荐在实践里使用这个函数。 不过可能是C/C++没有rust安全所以才不推荐使用的,就是不知道在rust里使用这个方法会有什么风险? 我试着把服务端代码刷新部分的代码替换成了malloc_trim。 测试以后,目前看来unsafe { libc::malloc_trim(0); }只能在默认分配器里使用,而且开销(10-150ms)比单独刷新tokio的运行时(1ms)大不少,但好处是可以刷新包括tokio运行时在内的全局所有的内存。 这个方法在mi_malloc里不起作用,但是mi_malloc经过我多次测试看来,mi_malloc下的各方面性能和开销都比默认分配器更好,如果不能兼容这个方法也是一个遗憾。 总而言之,目前唯一的最佳实践就是把内存分配器改成mi_malloc,但mi_malloc往往只能释放最高内存的70%左右,还是很难把外部监控的进程内存回缩到和内部内存接近的状态。 Apart from using C for problem-solving during my school days, I haven't used C since then and I'm not very familiar with system-level programming in C. I've read some articles about malloc_trim in C, and it seems that many don't recommend using this function in practice. However, perhaps it's not recommended because C/C++ isn't as safe as Rust. I wonder what the risks would be if we used this approach in Rust?

I tried replacing the memory refresh part of the server-side code with malloc_trim. After testing, it appears that unsafe { libc::malloc_trim(0); } can only be used with the default allocator, and it has a higher overhead (10-150ms) compared to refreshing Tokio's runtime (1ms) alone. However, the benefit is that it can refresh all the memory, including Tokio's runtime. This method doesn't work with mi_malloc, but after multiple tests, mi_malloc seems to offer better performance and lower overhead than the default allocator. It's a pity that this method isn't compatible.

In summary, the best practice currently is to switch to mi_malloc as the memory allocator, but mi_malloc can only release around 70% of the peak memory, making it difficult to shrink the externally monitored process memory to be close to the internal memory.

default-with-malloc_trim:

axum-demo pid is 9130
listening on 0.0.0.0:3000
axum 0, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 7.250M, main_rt: 1, global_rt: 0
axum 1, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 7.516M, main_rt: 1, global_rt: 0
axum 2, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 7.871M, main_rt: 1, global_rt: 0
axum 3, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 8.027M, main_rt: 1, global_rt: 0
axum 4, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 8.180M, main_rt: 1, global_rt: 0
axum 5, ct: 16271, conn: 16271, max_conn: 16271, inner_mem: 42.165M, outer_mem: 52.824M, main_rt: 1, global_rt: 17657
axum 6, ct: 170434, conn: 170434, max_conn: 170434, inner_mem: 498.168M, outer_mem: 494.062M, main_rt: 1, global_rt: 183049
axum 7, ct: 271712, conn: 271712, max_conn: 271712, inner_mem: 706.860M, outer_mem: 699.445M, main_rt: 1, global_rt: 273211
axum 8, ct: 387074, conn: 387074, max_conn: 387074, inner_mem: 1207.527M, outer_mem: 1237.660M, main_rt: 1, global_rt: 500337
axum 9, ct: 576981, conn: 576981, max_conn: 576981, inner_mem: 1812.442M, outer_mem: 1898.215M, main_rt: 1, global_rt: 783730
axum 10, ct: 905187, conn: 905186, max_conn: 905186, inner_mem: 2199.289M, outer_mem: 2410.465M, main_rt: 1, global_rt: 980451
axum 11, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 12, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 13, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 14, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 15, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 16, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 17, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 18, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 19, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 20, ct: 1000000, conn: 985301, max_conn: 1000000, inner_mem: 2221.098M, outer_mem: 2495.590M, main_rt: 1, global_rt: 984720
axum 21, ct: 1000000, conn: 858282, max_conn: 1000000, inner_mem: 2038.596M, outer_mem: 2518.465M, main_rt: 1, global_rt: 848208
axum 22, ct: 1000000, conn: 730042, max_conn: 1000000, inner_mem: 1835.695M, outer_mem: 2542.840M, main_rt: 1, global_rt: 728558
axum 23, ct: 1000000, conn: 612963, max_conn: 1000000, inner_mem: 1613.202M, outer_mem: 2542.840M, main_rt: 1, global_rt: 612573
axum 24, ct: 1000000, conn: 479217, max_conn: 1000000, inner_mem: 1383.951M, outer_mem: 2545.840M, main_rt: 1, global_rt: 469962
axum 25, ct: 1000000, conn: 272536, max_conn: 1000000, inner_mem: 1177.145M, outer_mem: 2587.465M, main_rt: 1, global_rt: 272551
axum 26, ct: 1000000, conn: 69432, max_conn: 1000000, inner_mem: 989.988M, outer_mem: 2651.965M, main_rt: 1, global_rt: 65095
axum 27, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 765.746M, outer_mem: 2650.969M, main_rt: 1, global_rt: 8
axum 28, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 721.817M, outer_mem: 2650.969M, main_rt: 1, global_rt: 8
axum 29, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 712.793M, outer_mem: 2648.633M, main_rt: 1, global_rt: 8
axum 30, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 712.793M, outer_mem: 2648.633M, main_rt: 1, global_rt: 8
axum 31, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 1949.277M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 149.609525ms
axum 32, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.504M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 3.636196ms
axum 33, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.504M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 8.711115ms
axum 34, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.621M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 8.114365ms
axum 35, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.742M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.988906ms
axum 36, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.738M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.937632ms
axum 37, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.574M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 10.070491ms
axum 38, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.809M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 8.968818ms
axum 39, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.434M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.703463ms
axum 40, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.680M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.89667ms
axum 41, ct: 1067334, conn: 67334, max_conn: 1000000, inner_mem: 180.037M, outer_mem: 258.148M, main_rt: 1, global_rt: 69257
axum 42, ct: 1190705, conn: 190705, max_conn: 1000000, inner_mem: 437.411M, outer_mem: 561.773M, main_rt: 1, global_rt: 190844
axum 43, ct: 1232686, conn: 232686, max_conn: 1000000, inner_mem: 563.877M, outer_mem: 686.023M, main_rt: 1, global_rt: 238335
axum 44, ct: 1432301, conn: 432301, max_conn: 1000000, inner_mem: 1105.969M, outer_mem: 1227.898M, main_rt: 1, global_rt: 437168
axum 45, ct: 1588981, conn: 588981, max_conn: 1000000, inner_mem: 1683.528M, outer_mem: 1826.148M, main_rt: 1, global_rt: 740774
axum 46, ct: 1892102, conn: 892102, max_conn: 1000000, inner_mem: 2051.242M, outer_mem: 2260.148M, main_rt: 1, global_rt: 896365
axum 47, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 48, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 49, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 50, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 51, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 52, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 53, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 54, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 55, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 56, ct: 2000000, conn: 933179, max_conn: 1000000, inner_mem: 2118.660M, outer_mem: 2456.398M, main_rt: 1, global_rt: 931752
axum 57, ct: 2000000, conn: 809527, max_conn: 1000000, inner_mem: 1896.305M, outer_mem: 2457.148M, main_rt: 1, global_rt: 809172
axum 58, ct: 2000000, conn: 780102, max_conn: 1000000, inner_mem: 1864.724M, outer_mem: 2465.273M, main_rt: 1, global_rt: 778506
axum 59, ct: 2000000, conn: 605753, max_conn: 1000000, inner_mem: 1646.963M, outer_mem: 2524.523M, main_rt: 1, global_rt: 602492
axum 60, ct: 2000000, conn: 469382, max_conn: 1000000, inner_mem: 1418.536M, outer_mem: 2573.648M, main_rt: 1, global_rt: 469397
axum 61, ct: 2000000, conn: 281814, max_conn: 1000000, inner_mem: 1207.039M, outer_mem: 2606.648M, main_rt: 1, global_rt: 277193
axum 62, ct: 2000000, conn: 47436, max_conn: 1000000, inner_mem: 1090.692M, outer_mem: 2743.148M, main_rt: 1, global_rt: 46909
axum 63, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 741.058M, outer_mem: 2764.898M, main_rt: 1, global_rt: 8
axum 64, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 691.793M, outer_mem: 2764.898M, main_rt: 1, global_rt: 8
axum 65, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 691.793M, outer_mem: 2764.898M, main_rt: 1, global_rt: 8
axum 66, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 691.793M, outer_mem: 2764.898M, main_rt: 1, global_rt: 8
axum 67, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2201.770M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 158.194198ms
axum 68, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 71.871M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 3.618341ms
axum 69, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 71.996M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.589179ms
axum 70, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 71.855M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 8.751279ms

mi_malloc-with-malloc_trim:

axum-demo pid is 9269
listening on 0.0.0.0:3000
axum 0, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 8.500M, main_rt: 1, global_rt: 0
axum 1, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 9.246M, main_rt: 1, global_rt: 0
axum 2, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 10.121M, main_rt: 1, global_rt: 0
axum 3, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 10.996M, main_rt: 1, global_rt: 0
axum 4, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 11.246M, main_rt: 1, global_rt: 0
axum 5, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 11.371M, main_rt: 1, global_rt: 0
axum 6, ct: 17435, conn: 17435, max_conn: 17435, inner_mem: 60.169M, outer_mem: 78.621M, main_rt: 1, global_rt: 22172
axum 7, ct: 242203, conn: 242203, max_conn: 242203, inner_mem: 587.203M, outer_mem: 595.898M, main_rt: 1, global_rt: 242211
axum 8, ct: 359916, conn: 359916, max_conn: 359916, inner_mem: 848.749M, outer_mem: 852.164M, main_rt: 1, global_rt: 359966
axum 9, ct: 599406, conn: 599406, max_conn: 599406, inner_mem: 1553.196M, outer_mem: 1589.867M, main_rt: 1, global_rt: 657818
axum 10, ct: 918649, conn: 918649, max_conn: 918649, inner_mem: 2096.274M, outer_mem: 2173.543M, main_rt: 1, global_rt: 921678
axum 11, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.633M, main_rt: 1, global_rt: 1000008
axum 12, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.008M, main_rt: 1, global_rt: 1000008
axum 13, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.258M, main_rt: 1, global_rt: 1000008
axum 14, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.258M, main_rt: 1, global_rt: 1000008
axum 15, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.488M, main_rt: 1, global_rt: 1000008
axum 16, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.488M, main_rt: 1, global_rt: 1000008
axum 17, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.488M, main_rt: 1, global_rt: 1000008
axum 18, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.457M, main_rt: 1, global_rt: 1000008
axum 19, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.707M, main_rt: 1, global_rt: 1000008
axum 20, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.531M, main_rt: 1, global_rt: 1000008
axum 21, ct: 1000000, conn: 991933, max_conn: 1000000, inner_mem: 2230.664M, outer_mem: 2334.121M, main_rt: 1, global_rt: 988973
axum 22, ct: 1000000, conn: 782622, max_conn: 1000000, inner_mem: 1947.606M, outer_mem: 2121.934M, main_rt: 1, global_rt: 776002
axum 23, ct: 1000000, conn: 651409, max_conn: 1000000, inner_mem: 1703.946M, outer_mem: 1798.531M, main_rt: 1, global_rt: 647738
axum 24, ct: 1000000, conn: 418207, max_conn: 1000000, inner_mem: 1287.050M, outer_mem: 1338.312M, main_rt: 1, global_rt: 416387
axum 25, ct: 1000000, conn: 187447, max_conn: 1000000, inner_mem: 928.006M, outer_mem: 1042.410M, main_rt: 1, global_rt: 182534
axum 26, ct: 1000000, conn: 1509, max_conn: 1000000, inner_mem: 702.928M, outer_mem: 788.777M, main_rt: 1, global_rt: 44
axum 27, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 740.062M, main_rt: 1, global_rt: 8
axum 28, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 739.590M, main_rt: 1, global_rt: 8
axum 29, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 739.590M, main_rt: 1, global_rt: 8
axum 30, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 739.590M, main_rt: 1, global_rt: 8
axum 31, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 739.715M, main_rt: 1, global_rt: 8
axum 32, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.250M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 37.747µs
axum 33, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.375M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 25.185µs
axum 34, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.625M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 26.65µs
axum 35, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.492M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 26.685µs
axum 36, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.492M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 18.455µs
axum 37, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.492M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 27.96µs
axum 38, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.363M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 18.53µs
axum 39, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.488M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 28.691µs
axum 40, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.488M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 24.641µs
axum 41, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.488M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 27.643µs
axum 42, ct: 1088219, conn: 88219, max_conn: 1000000, inner_mem: 210.882M, outer_mem: 290.008M, main_rt: 1, global_rt: 88233
axum 43, ct: 1100424, conn: 100424, max_conn: 1000000, inner_mem: 235.734M, outer_mem: 310.152M, main_rt: 1, global_rt: 101198
axum 44, ct: 1277073, conn: 277073, max_conn: 1000000, inner_mem: 694.419M, outer_mem: 742.621M, main_rt: 1, global_rt: 289016
axum 45, ct: 1522640, conn: 522640, max_conn: 1000000, inner_mem: 1402.240M, outer_mem: 1448.352M, main_rt: 1, global_rt: 617107
axum 46, ct: 1842830, conn: 842830, max_conn: 1000000, inner_mem: 2053.293M, outer_mem: 2134.516M, main_rt: 1, global_rt: 896214
axum 47, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.816M, main_rt: 1, global_rt: 1000008
axum 48, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.582M, main_rt: 1, global_rt: 1000008
axum 49, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.418M, main_rt: 1, global_rt: 1000008
axum 50, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.543M, main_rt: 1, global_rt: 1000008
axum 51, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.371M, main_rt: 1, global_rt: 1000008
axum 52, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.371M, main_rt: 1, global_rt: 1000008
axum 53, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.371M, main_rt: 1, global_rt: 1000008
axum 54, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.371M, main_rt: 1, global_rt: 1000008
axum 55, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.496M, main_rt: 1, global_rt: 1000008
axum 56, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.340M, main_rt: 1, global_rt: 1000008
axum 57, ct: 2000000, conn: 931466, max_conn: 1000000, inner_mem: 2169.154M, outer_mem: 2311.867M, main_rt: 1, global_rt: 929274
axum 58, ct: 2000000, conn: 899900, max_conn: 1000000, inner_mem: 2085.752M, outer_mem: 2207.188M, main_rt: 1, global_rt: 898818
axum 59, ct: 2000000, conn: 746219, max_conn: 1000000, inner_mem: 1851.457M, outer_mem: 1964.664M, main_rt: 1, global_rt: 741926
axum 60, ct: 2000000, conn: 499928, max_conn: 1000000, inner_mem: 1431.778M, outer_mem: 1526.262M, main_rt: 1, global_rt: 499943
axum 61, ct: 2000000, conn: 257698, max_conn: 1000000, inner_mem: 1066.313M, outer_mem: 1196.879M, main_rt: 1, global_rt: 253226
axum 62, ct: 2000000, conn: 69470, max_conn: 1000000, inner_mem: 831.906M, outer_mem: 1107.055M, main_rt: 1, global_rt: 69485
axum 63, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 623.543M, outer_mem: 852.238M, main_rt: 1, global_rt: 8
axum 64, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 623.543M, outer_mem: 852.238M, main_rt: 1, global_rt: 8
axum 65, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 623.543M, outer_mem: 852.238M, main_rt: 1, global_rt: 8
axum 66, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 623.543M, outer_mem: 851.945M, main_rt: 1, global_rt: 8
axum 67, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 852.176M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 26.701µs
axum 68, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 852.160M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 22.268µs
axum 69, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 852.160M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 24.917µs
axum 70, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 852.285M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 27.244µs
lixiang365 commented 3 months ago

@TinusgragLin @lixiang365

我除了上学的时候用过C语言刷题,之后就再没有用过C语言了,对C语言的系统级编程不是很了解。 我看了一些C语言关于malloc_trim的文章,似乎有不少文章不推荐在实践里使用这个函数。 不过可能是C/C++没有rust安全所以才不推荐使用的,就是不知道在rust里使用这个方法会有什么风险? 我试着把服务端代码刷新部分的代码替换成了malloc_trim。 测试以后,目前看来unsafe { libc::malloc_trim(0); }只能在默认分配器里使用,而且开销(10-150ms)比单独刷新tokio的运行时(1ms)大不少,但好处是可以刷新包括tokio运行时在内的全局所有的内存。 这个方法在mi_malloc里不起作用,但是mi_malloc经过我多次测试看来,mi_malloc下的各方面性能和开销都比默认分配器更好,如果不能兼容这个方法也是一个遗憾。 总而言之,目前唯一的最佳实践就是把内存分配器改成mi_malloc,但mi_malloc往往只能释放最高内存的70%左右,还是很难把外部监控的进程内存回缩到和内部内存接近的状态。 Apart from using C for problem-solving during my school days, I haven't used C since then and I'm not very familiar with system-level programming in C. I've read some articles about malloc_trim in C, and it seems that many don't recommend using this function in practice. However, perhaps it's not recommended because C/C++ isn't as safe as Rust. I wonder what the risks would be if we used this approach in Rust?

I tried replacing the memory refresh part of the server-side code with malloc_trim. After testing, it appears that unsafe { libc::malloc_trim(0); } can only be used with the default allocator, and it has a higher overhead (10-150ms) compared to refreshing Tokio's runtime (1ms) alone. However, the benefit is that it can refresh all the memory, including Tokio's runtime. This method doesn't work with mi_malloc, but after multiple tests, mi_malloc seems to offer better performance and lower overhead than the default allocator. It's a pity that this method isn't compatible.

In summary, the best practice currently is to switch to mi_malloc as the memory allocator, but mi_malloc can only release around 70% of the peak memory, making it difficult to shrink the externally monitored process memory to be close to the internal memory.

default-with-malloc_trim:

axum-demo pid is 9130
listening on 0.0.0.0:3000
axum 0, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 7.250M, main_rt: 1, global_rt: 0
axum 1, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 7.516M, main_rt: 1, global_rt: 0
axum 2, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 7.871M, main_rt: 1, global_rt: 0
axum 3, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 8.027M, main_rt: 1, global_rt: 0
axum 4, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 8.180M, main_rt: 1, global_rt: 0
axum 5, ct: 16271, conn: 16271, max_conn: 16271, inner_mem: 42.165M, outer_mem: 52.824M, main_rt: 1, global_rt: 17657
axum 6, ct: 170434, conn: 170434, max_conn: 170434, inner_mem: 498.168M, outer_mem: 494.062M, main_rt: 1, global_rt: 183049
axum 7, ct: 271712, conn: 271712, max_conn: 271712, inner_mem: 706.860M, outer_mem: 699.445M, main_rt: 1, global_rt: 273211
axum 8, ct: 387074, conn: 387074, max_conn: 387074, inner_mem: 1207.527M, outer_mem: 1237.660M, main_rt: 1, global_rt: 500337
axum 9, ct: 576981, conn: 576981, max_conn: 576981, inner_mem: 1812.442M, outer_mem: 1898.215M, main_rt: 1, global_rt: 783730
axum 10, ct: 905187, conn: 905186, max_conn: 905186, inner_mem: 2199.289M, outer_mem: 2410.465M, main_rt: 1, global_rt: 980451
axum 11, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 12, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 13, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 14, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 15, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 16, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 17, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 18, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 19, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2242.541M, outer_mem: 2488.965M, main_rt: 1, global_rt: 1000008
axum 20, ct: 1000000, conn: 985301, max_conn: 1000000, inner_mem: 2221.098M, outer_mem: 2495.590M, main_rt: 1, global_rt: 984720
axum 21, ct: 1000000, conn: 858282, max_conn: 1000000, inner_mem: 2038.596M, outer_mem: 2518.465M, main_rt: 1, global_rt: 848208
axum 22, ct: 1000000, conn: 730042, max_conn: 1000000, inner_mem: 1835.695M, outer_mem: 2542.840M, main_rt: 1, global_rt: 728558
axum 23, ct: 1000000, conn: 612963, max_conn: 1000000, inner_mem: 1613.202M, outer_mem: 2542.840M, main_rt: 1, global_rt: 612573
axum 24, ct: 1000000, conn: 479217, max_conn: 1000000, inner_mem: 1383.951M, outer_mem: 2545.840M, main_rt: 1, global_rt: 469962
axum 25, ct: 1000000, conn: 272536, max_conn: 1000000, inner_mem: 1177.145M, outer_mem: 2587.465M, main_rt: 1, global_rt: 272551
axum 26, ct: 1000000, conn: 69432, max_conn: 1000000, inner_mem: 989.988M, outer_mem: 2651.965M, main_rt: 1, global_rt: 65095
axum 27, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 765.746M, outer_mem: 2650.969M, main_rt: 1, global_rt: 8
axum 28, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 721.817M, outer_mem: 2650.969M, main_rt: 1, global_rt: 8
axum 29, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 712.793M, outer_mem: 2648.633M, main_rt: 1, global_rt: 8
axum 30, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 712.793M, outer_mem: 2648.633M, main_rt: 1, global_rt: 8
axum 31, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 1949.277M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 149.609525ms
axum 32, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.504M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 3.636196ms
axum 33, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.504M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 8.711115ms
axum 34, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.621M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 8.114365ms
axum 35, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.742M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.988906ms
axum 36, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.738M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.937632ms
axum 37, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.574M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 10.070491ms
axum 38, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.809M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 8.968818ms
axum 39, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.434M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.703463ms
axum 40, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 47.680M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.89667ms
axum 41, ct: 1067334, conn: 67334, max_conn: 1000000, inner_mem: 180.037M, outer_mem: 258.148M, main_rt: 1, global_rt: 69257
axum 42, ct: 1190705, conn: 190705, max_conn: 1000000, inner_mem: 437.411M, outer_mem: 561.773M, main_rt: 1, global_rt: 190844
axum 43, ct: 1232686, conn: 232686, max_conn: 1000000, inner_mem: 563.877M, outer_mem: 686.023M, main_rt: 1, global_rt: 238335
axum 44, ct: 1432301, conn: 432301, max_conn: 1000000, inner_mem: 1105.969M, outer_mem: 1227.898M, main_rt: 1, global_rt: 437168
axum 45, ct: 1588981, conn: 588981, max_conn: 1000000, inner_mem: 1683.528M, outer_mem: 1826.148M, main_rt: 1, global_rt: 740774
axum 46, ct: 1892102, conn: 892102, max_conn: 1000000, inner_mem: 2051.242M, outer_mem: 2260.148M, main_rt: 1, global_rt: 896365
axum 47, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 48, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 49, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 50, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 51, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 52, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 53, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 54, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 55, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2444.898M, main_rt: 1, global_rt: 1000008
axum 56, ct: 2000000, conn: 933179, max_conn: 1000000, inner_mem: 2118.660M, outer_mem: 2456.398M, main_rt: 1, global_rt: 931752
axum 57, ct: 2000000, conn: 809527, max_conn: 1000000, inner_mem: 1896.305M, outer_mem: 2457.148M, main_rt: 1, global_rt: 809172
axum 58, ct: 2000000, conn: 780102, max_conn: 1000000, inner_mem: 1864.724M, outer_mem: 2465.273M, main_rt: 1, global_rt: 778506
axum 59, ct: 2000000, conn: 605753, max_conn: 1000000, inner_mem: 1646.963M, outer_mem: 2524.523M, main_rt: 1, global_rt: 602492
axum 60, ct: 2000000, conn: 469382, max_conn: 1000000, inner_mem: 1418.536M, outer_mem: 2573.648M, main_rt: 1, global_rt: 469397
axum 61, ct: 2000000, conn: 281814, max_conn: 1000000, inner_mem: 1207.039M, outer_mem: 2606.648M, main_rt: 1, global_rt: 277193
axum 62, ct: 2000000, conn: 47436, max_conn: 1000000, inner_mem: 1090.692M, outer_mem: 2743.148M, main_rt: 1, global_rt: 46909
axum 63, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 741.058M, outer_mem: 2764.898M, main_rt: 1, global_rt: 8
axum 64, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 691.793M, outer_mem: 2764.898M, main_rt: 1, global_rt: 8
axum 65, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 691.793M, outer_mem: 2764.898M, main_rt: 1, global_rt: 8
axum 66, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 691.793M, outer_mem: 2764.898M, main_rt: 1, global_rt: 8
axum 67, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2201.770M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 158.194198ms
axum 68, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 71.871M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 3.618341ms
axum 69, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 71.996M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 9.589179ms
axum 70, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 71.855M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc until 20M! cost 8.751279ms

mi_malloc-with-malloc_trim:

axum-demo pid is 9269
listening on 0.0.0.0:3000
axum 0, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 8.500M, main_rt: 1, global_rt: 0
axum 1, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 9.246M, main_rt: 1, global_rt: 0
axum 2, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 10.121M, main_rt: 1, global_rt: 0
axum 3, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 10.996M, main_rt: 1, global_rt: 0
axum 4, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 11.246M, main_rt: 1, global_rt: 0
axum 5, ct: 0, conn: 0, max_conn: 0, inner_mem: 0.258M, outer_mem: 11.371M, main_rt: 1, global_rt: 0
axum 6, ct: 17435, conn: 17435, max_conn: 17435, inner_mem: 60.169M, outer_mem: 78.621M, main_rt: 1, global_rt: 22172
axum 7, ct: 242203, conn: 242203, max_conn: 242203, inner_mem: 587.203M, outer_mem: 595.898M, main_rt: 1, global_rt: 242211
axum 8, ct: 359916, conn: 359916, max_conn: 359916, inner_mem: 848.749M, outer_mem: 852.164M, main_rt: 1, global_rt: 359966
axum 9, ct: 599406, conn: 599406, max_conn: 599406, inner_mem: 1553.196M, outer_mem: 1589.867M, main_rt: 1, global_rt: 657818
axum 10, ct: 918649, conn: 918649, max_conn: 918649, inner_mem: 2096.274M, outer_mem: 2173.543M, main_rt: 1, global_rt: 921678
axum 11, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.633M, main_rt: 1, global_rt: 1000008
axum 12, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.008M, main_rt: 1, global_rt: 1000008
axum 13, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.258M, main_rt: 1, global_rt: 1000008
axum 14, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.258M, main_rt: 1, global_rt: 1000008
axum 15, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.488M, main_rt: 1, global_rt: 1000008
axum 16, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.488M, main_rt: 1, global_rt: 1000008
axum 17, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.488M, main_rt: 1, global_rt: 1000008
axum 18, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.457M, main_rt: 1, global_rt: 1000008
axum 19, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.707M, main_rt: 1, global_rt: 1000008
axum 20, ct: 1000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.791M, outer_mem: 2330.531M, main_rt: 1, global_rt: 1000008
axum 21, ct: 1000000, conn: 991933, max_conn: 1000000, inner_mem: 2230.664M, outer_mem: 2334.121M, main_rt: 1, global_rt: 988973
axum 22, ct: 1000000, conn: 782622, max_conn: 1000000, inner_mem: 1947.606M, outer_mem: 2121.934M, main_rt: 1, global_rt: 776002
axum 23, ct: 1000000, conn: 651409, max_conn: 1000000, inner_mem: 1703.946M, outer_mem: 1798.531M, main_rt: 1, global_rt: 647738
axum 24, ct: 1000000, conn: 418207, max_conn: 1000000, inner_mem: 1287.050M, outer_mem: 1338.312M, main_rt: 1, global_rt: 416387
axum 25, ct: 1000000, conn: 187447, max_conn: 1000000, inner_mem: 928.006M, outer_mem: 1042.410M, main_rt: 1, global_rt: 182534
axum 26, ct: 1000000, conn: 1509, max_conn: 1000000, inner_mem: 702.928M, outer_mem: 788.777M, main_rt: 1, global_rt: 44
axum 27, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 740.062M, main_rt: 1, global_rt: 8
axum 28, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 739.590M, main_rt: 1, global_rt: 8
axum 29, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 739.590M, main_rt: 1, global_rt: 8
axum 30, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 739.590M, main_rt: 1, global_rt: 8
axum 31, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 592.043M, outer_mem: 739.715M, main_rt: 1, global_rt: 8
axum 32, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.250M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 37.747µs
axum 33, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.375M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 25.185µs
axum 34, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.625M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 26.65µs
axum 35, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.492M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 26.685µs
axum 36, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.492M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 18.455µs
axum 37, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.492M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 27.96µs
axum 38, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.363M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 18.53µs
axum 39, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.488M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 28.691µs
axum 40, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.488M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 24.641µs
axum 41, ct: 1000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 727.488M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 27.643µs
axum 42, ct: 1088219, conn: 88219, max_conn: 1000000, inner_mem: 210.882M, outer_mem: 290.008M, main_rt: 1, global_rt: 88233
axum 43, ct: 1100424, conn: 100424, max_conn: 1000000, inner_mem: 235.734M, outer_mem: 310.152M, main_rt: 1, global_rt: 101198
axum 44, ct: 1277073, conn: 277073, max_conn: 1000000, inner_mem: 694.419M, outer_mem: 742.621M, main_rt: 1, global_rt: 289016
axum 45, ct: 1522640, conn: 522640, max_conn: 1000000, inner_mem: 1402.240M, outer_mem: 1448.352M, main_rt: 1, global_rt: 617107
axum 46, ct: 1842830, conn: 842830, max_conn: 1000000, inner_mem: 2053.293M, outer_mem: 2134.516M, main_rt: 1, global_rt: 896214
axum 47, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.816M, main_rt: 1, global_rt: 1000008
axum 48, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.582M, main_rt: 1, global_rt: 1000008
axum 49, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.418M, main_rt: 1, global_rt: 1000008
axum 50, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.543M, main_rt: 1, global_rt: 1000008
axum 51, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.371M, main_rt: 1, global_rt: 1000008
axum 52, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.371M, main_rt: 1, global_rt: 1000008
axum 53, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.371M, main_rt: 1, global_rt: 1000008
axum 54, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.371M, main_rt: 1, global_rt: 1000008
axum 55, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.496M, main_rt: 1, global_rt: 1000008
axum 56, ct: 2000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.291M, outer_mem: 2343.340M, main_rt: 1, global_rt: 1000008
axum 57, ct: 2000000, conn: 931466, max_conn: 1000000, inner_mem: 2169.154M, outer_mem: 2311.867M, main_rt: 1, global_rt: 929274
axum 58, ct: 2000000, conn: 899900, max_conn: 1000000, inner_mem: 2085.752M, outer_mem: 2207.188M, main_rt: 1, global_rt: 898818
axum 59, ct: 2000000, conn: 746219, max_conn: 1000000, inner_mem: 1851.457M, outer_mem: 1964.664M, main_rt: 1, global_rt: 741926
axum 60, ct: 2000000, conn: 499928, max_conn: 1000000, inner_mem: 1431.778M, outer_mem: 1526.262M, main_rt: 1, global_rt: 499943
axum 61, ct: 2000000, conn: 257698, max_conn: 1000000, inner_mem: 1066.313M, outer_mem: 1196.879M, main_rt: 1, global_rt: 253226
axum 62, ct: 2000000, conn: 69470, max_conn: 1000000, inner_mem: 831.906M, outer_mem: 1107.055M, main_rt: 1, global_rt: 69485
axum 63, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 623.543M, outer_mem: 852.238M, main_rt: 1, global_rt: 8
axum 64, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 623.543M, outer_mem: 852.238M, main_rt: 1, global_rt: 8
axum 65, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 623.543M, outer_mem: 852.238M, main_rt: 1, global_rt: 8
axum 66, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 623.543M, outer_mem: 851.945M, main_rt: 1, global_rt: 8
axum 67, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 852.176M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 26.701µs
axum 68, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 852.160M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 22.268µs
axum 69, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 852.160M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 24.917µs
axum 70, ct: 2000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 852.285M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on MiMalloc until 20M! cost 27.244µs
  1. 这个问题涉及到系统编程内存管理, 简单说就是 应用程序如何从系统申请释放内存,系统调用malloc ,free 发生了什么。
  2. 这个问题的关键是 libc::free 后 内存为什么没有归还给系统? 这个在 libc 的 free 源码中有答案,libc 的内存分配器也就是ptmalloc(我记得是这个,但不确定)。大概流程,free 时会把小块合成大块空闲内存,top chunk 堆顶块大小超过设定的阈值(默认128kb),就会触发堆收缩,归还系统,收缩内存是从top chunk开始,如果top chunk不能释放,top chunk以下的chunk都无法释放。
  3. 为什么 malloc_trim 对 默认内存分配器有效,但是对mi_malloc 不起作用? 如果你去读rust 的默认内存分配器的实现,你会发现他只是封装了libc 的 malloc,free 等几个接口,用的是libc 的内存分配器也就是ptmallocmi_malloc,我还没有去读源码,内存分配器一般他自己管理了一些空闲的内存块,方便再次使用的时候,快速获取,不用系统调用。所以mi_malloc 也有自己的策略,他除了自己需要的,会保留,其他的在free时,顺便就归还给系统了。这也就是为什么 mi_malloc 不起作用的原因,因为mi_malloc 他已经归还了不需要的部分,剩余的都是他自己管理的内存池。
  4. 关于这个问题的解决方案。 我之前也说了,我不推荐刷新运行时和malloc_trim。每个程序有自己的业务场景,在高并发的场景下,tokio 申请释放内存的策略,块大小,默认内存分配器ptmalloc 的收缩策略等一些原因,导致高并发下的默认内存分配器ptmalloc ,没有收缩到合理的大小。所以我推荐换一个合适的内存分配器,或者自己实现内存分配器,更好的针对业务场景,或者库的开发者针对ptmalloc做一定的内存优化。(像其他一些项目里可能有的内存池,对象池等都是为了更高效管理使用内存)
TinusgragLin commented 3 months ago

it has a higher overhead (10-150ms) compared to refreshing Tokio's runtime (1ms) alone

Although it's reasonable for malloc_trim to take longer time, I believe that the actual difference should be smaller: in the refreshing-runtime test, your measurement did not include the time to fully drop the runtime, the old_global_runtime drops after the time measurement.

lithbitren commented 3 months ago

@lixiang365 @TinusgragLin @Darksonn


谢谢你们的回复!我这个周末重新测试了一下服务器直接使用malloc_trim来释放内存的情况,大致策略和刷新运行时一致,我发现使用malloc_trim和刷新运行时并不完全冲突,每半分钟进行一次百万并发,每个测试大约持续4个小时几百轮次的百万并发。最终的内存结果如下:

Thanks for your replies! Over the weekend, I retested the server using malloc_trim to release memory directly. The strategy was similar to refreshing the runtime, and I found that using malloc_trim and refreshing the runtime are not entirely conflicting. I performed one million concurrent requests every half-minute, with each test lasting approximately 4 hours and consisting of hundreds of rounds. Here are the final memory results:

hyper-axum-server max final
default_alloctor 3000 MB 2500-3000 MB
default_alloctor-malloc_trim 3000 MB 320-420 MB
default_alloctor-refresh_runtime 2700MB 15-100MB
default_alloctor-refresh_runtime-malloc_trim 2700 MB 15 MB
mi_malloc 2300 MB 600-900 MB
mi_malloc-refresh_runtime 2300 MB 15 MB

@TinusgragLin 你说得对,这种测量时间的方法确实不能测量出销毁变量并释放内存的时间。我在测试长期来看,如果只用malloc_trim,最高每次只要160-200ms,但如果在malloc_trim之前刷新运行时,那最终只稳定的需要100ms这样,说明刷新运行时让malloc_trim少执行了100ms。

malloc_trim感觉算是内存分配处理的一个补充,如果先刷新运行时,再使用malloc_trim,内存最终会几乎完全释放。如果只是单纯的使用malloc_trim,好处是可以跨平台,而且可以释放85%左右的内存,坏处是需要手动设计策略。

使用mi_malloc就只能在特定平台,但对代码的入侵程度最小,只需要一次全局声明就可以了,可以大约释放70%的内存。

我这种刷新运行时的方法显然不是一个good practice,对代码的入侵太大,并不适合老项目,如果tokio能够提供一种方便的刷新方法的话,可以配合malloc_trim使用,可以让内存最终尽可能的释放。

@TinusgragLin You were right, this method of measurement doesn't accurately reflect the time it takes to destroy variables and release memory. In the long-term test, if I only used malloc_trim, it took a maximum of just 160-200ms each time, but if I refreshed the runtime before using malloc_trim, the final stable time needed was only around 100ms, indicating that refreshing the runtime reduced the execution time of malloc_trim by 100ms.

malloc_trim feels like a supplement to memory allocation handling. If you refresh the runtime first and then use malloc_trim, the memory will almost completely release. If you simply use malloc_trim, the advantage is that it's cross-platform and can release about 85% of the memory, but the downside is that you need to manually design a strategy.

Using mi_malloc is limited to specific platforms, but it has the least code intrusion, requiring only a single global declaration, can release about 70% of the memory.

Clearly, my method of refreshing the runtime is not a good practice, as it introduces too much code intrusion and is not suitable for older projects. If Tokio could provide a convenient way to refresh, it could be used in conjunction with malloc_trim, allowing for nearly complete memory release.

default_alloctor-malloc_trim(runing for 3hours):

axum 11459, ct: 317000000, conn: 408958, max_conn: 1000000, inner_mem: 1215.523M, outer_mem: 2907.281M, main_rt: 408025, global_rt: 0
axum 11460, ct: 317000000, conn: 219237, max_conn: 1000000, inner_mem: 986.669M, outer_mem: 2936.781M, main_rt: 215626, global_rt: 0
axum 11461, ct: 317000000, conn: 64606, max_conn: 1000000, inner_mem: 753.117M, outer_mem: 3026.781M, main_rt: 63242, global_rt: 0
axum 11462, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 582.299M, outer_mem: 3030.656M, main_rt: 9, global_rt: 0
axum 11463, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 582.299M, outer_mem: 3030.656M, main_rt: 9, global_rt: 0
axum 11464, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 582.299M, outer_mem: 3030.656M, main_rt: 9, global_rt: 0
axum 11465, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 582.299M, outer_mem: 3030.656M, main_rt: 9, global_rt: 0
axum 11466, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2864.730M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc when delta 300M! cost 176.808835ms
axum 11467, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 369.176M, main_rt: 1, global_rt: 0
axum 11468, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 370.301M, main_rt: 1, global_rt: 0
axum 11469, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 370.301M, main_rt: 1, global_rt: 0
axum 11470, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 370.301M, main_rt: 1, global_rt: 0
axum 11471, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 370.301M, main_rt: 1, global_rt: 0
axum 11472, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 370.301M, main_rt: 1, global_rt: 0
axum 11473, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 370.301M, main_rt: 1, global_rt: 0
axum 11474, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 370.301M, main_rt: 1, global_rt: 0
axum 11475, ct: 317000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 370.301M, main_rt: 1, global_rt: 0
axum 11476, ct: 317073763, conn: 73763, max_conn: 1000000, inner_mem: 179.791M, outer_mem: 562.426M, main_rt: 75031, global_rt: 0
axum 11477, ct: 317252578, conn: 252578, max_conn: 1000000, inner_mem: 631.512M, outer_mem: 1045.176M, main_rt: 258999, global_rt: 0
axum 11478, ct: 317409303, conn: 409303, max_conn: 1000000, inner_mem: 945.071M, outer_mem: 1395.676M, main_rt: 409312, global_rt: 0
axum 11479, ct: 317544481, conn: 544481, max_conn: 1000000, inner_mem: 1315.807M, outer_mem: 1781.176M, main_rt: 567308, global_rt: 0
axum 11480, ct: 317738419, conn: 738419, max_conn: 1000000, inner_mem: 1886.399M, outer_mem: 2395.676M, main_rt: 824162, global_rt: 0
axum 11481, ct: 317982242, conn: 982242, max_conn: 1000000, inner_mem: 2211.048M, outer_mem: 2799.801M, main_rt: 983928, global_rt: 0
axum 11482, ct: 318000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2811.301M, main_rt: 1000009, global_rt: 0
axum 11483, ct: 318000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2812.051M, main_rt: 1000009, global_rt: 0
axum 11484, ct: 318000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2812.051M, main_rt: 1000009, global_rt: 0
axum 11485, ct: 318000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2812.051M, main_rt: 1000009, global_rt: 0
axum 11486, ct: 318000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2812.051M, main_rt: 1000009, global_rt: 0
axum 11487, ct: 318000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2812.051M, main_rt: 1000009, global_rt: 0
axum 11488, ct: 318000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2812.051M, main_rt: 1000009, global_rt: 0
axum 11489, ct: 318000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2812.051M, main_rt: 1000009, global_rt: 0
axum 11490, ct: 318000000, conn: 1000000, max_conn: 1000000, inner_mem: 2231.292M, outer_mem: 2812.051M, main_rt: 1000009, global_rt: 0
axum 11491, ct: 318000000, conn: 926287, max_conn: 1000000, inner_mem: 2113.354M, outer_mem: 2823.551M, main_rt: 924883, global_rt: 0
axum 11492, ct: 318000000, conn: 734592, max_conn: 1000000, inner_mem: 1827.976M, outer_mem: 2848.426M, main_rt: 728708, global_rt: 0
axum 11493, ct: 318000000, conn: 587175, max_conn: 1000000, inner_mem: 1545.352M, outer_mem: 2859.801M, main_rt: 584492, global_rt: 0
axum 11494, ct: 318000000, conn: 445828, max_conn: 1000000, inner_mem: 1306.744M, outer_mem: 2863.051M, main_rt: 440324, global_rt: 0
axum 11495, ct: 318000000, conn: 149202, max_conn: 1000000, inner_mem: 866.818M, outer_mem: 2927.801M, main_rt: 147711, global_rt: 0
axum 11496, ct: 318000000, conn: 128884, max_conn: 1000000, inner_mem: 857.337M, outer_mem: 2927.801M, main_rt: 124946, global_rt: 0
axum 11497, ct: 318000000, conn: 5934, max_conn: 1000000, inner_mem: 629.446M, outer_mem: 2973.551M, main_rt: 4541, global_rt: 0
axum 11498, ct: 318000000, conn: 0, max_conn: 1000000, inner_mem: 594.293M, outer_mem: 2973.551M, main_rt: 9, global_rt: 0
axum 11499, ct: 318000000, conn: 0, max_conn: 1000000, inner_mem: 594.293M, outer_mem: 2973.551M, main_rt: 9, global_rt: 0
axum 11500, ct: 318000000, conn: 0, max_conn: 1000000, inner_mem: 594.293M, outer_mem: 2973.551M, main_rt: 9, global_rt: 0
axum 11501, ct: 318000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 2816.656M, main_rt: 1, global_rt: 0, libc::malloc_trim(0) on DefaultAlloc when delta 300M! cost 162.254986ms
axum 11502, ct: 318000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 370.715M, main_rt: 1, global_rt: 0
axum 11503, ct: 318000000, conn: 0, max_conn: 1000000, inner_mem: 0.258M, outer_mem: 371.715M, main_rt: 1, global_rt: 0

default_alloctor-refresh_runtime-malloc_trim(runing for 3hous):

axum 11011, ct: 298000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 15.820M, main_rt: 1, global_rt: 0
axum 11012, ct: 298000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 15.820M, main_rt: 1, global_rt: 0
axum 11013, ct: 298000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 15.820M, main_rt: 1, global_rt: 0
axum 11014, ct: 298000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 15.820M, main_rt: 1, global_rt: 0
axum 11015, ct: 298068202, conn: 68202, max_conn: 1000000, inner_mem: 166.016M, outer_mem: 186.820M, main_rt: 1, global_rt: 69248
axum 11016, ct: 298245183, conn: 245183, max_conn: 1000000, inner_mem: 661.654M, outer_mem: 691.570M, main_rt: 1, global_rt: 252440
axum 11017, ct: 298467852, conn: 467852, max_conn: 1000000, inner_mem: 1163.433M, outer_mem: 1188.445M, main_rt: 1, global_rt: 472102
axum 11018, ct: 298659699, conn: 659699, max_conn: 1000000, inner_mem: 1569.837M, outer_mem: 1635.070M, main_rt: 1, global_rt: 664500
axum 11019, ct: 298749977, conn: 749977, max_conn: 1000000, inner_mem: 1890.490M, outer_mem: 1980.195M, main_rt: 1, global_rt: 810234
axum 11020, ct: 298991986, conn: 991986, max_conn: 1000000, inner_mem: 2226.706M, outer_mem: 2400.820M, main_rt: 1, global_rt: 992491
axum 11021, ct: 299000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.799M, outer_mem: 2414.320M, main_rt: 1, global_rt: 1000008
axum 11022, ct: 299000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.799M, outer_mem: 2414.320M, main_rt: 1, global_rt: 1000008
axum 11023, ct: 299000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.799M, outer_mem: 2414.320M, main_rt: 1, global_rt: 1000008
axum 11024, ct: 299000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.799M, outer_mem: 2414.320M, main_rt: 1, global_rt: 1000008
axum 11025, ct: 299000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.799M, outer_mem: 2414.320M, main_rt: 1, global_rt: 1000008
axum 11026, ct: 299000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.799M, outer_mem: 2414.320M, main_rt: 1, global_rt: 1000008
axum 11027, ct: 299000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.799M, outer_mem: 2414.320M, main_rt: 1, global_rt: 1000008
axum 11028, ct: 299000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.799M, outer_mem: 2414.320M, main_rt: 1, global_rt: 1000008
axum 11029, ct: 299000000, conn: 1000000, max_conn: 1000000, inner_mem: 2238.799M, outer_mem: 2414.320M, main_rt: 1, global_rt: 1000008
axum 11030, ct: 299000000, conn: 941703, max_conn: 1000000, inner_mem: 2155.591M, outer_mem: 2431.695M, main_rt: 1, global_rt: 941718
axum 11031, ct: 299000000, conn: 779484, max_conn: 1000000, inner_mem: 1923.766M, outer_mem: 2455.195M, main_rt: 1, global_rt: 777482
axum 11032, ct: 299000000, conn: 594865, max_conn: 1000000, inner_mem: 1691.533M, outer_mem: 2502.445M, main_rt: 1, global_rt: 583849
axum 11033, ct: 299000000, conn: 423470, max_conn: 1000000, inner_mem: 1431.546M, outer_mem: 2550.445M, main_rt: 1, global_rt: 417647
axum 11034, ct: 299000000, conn: 304943, max_conn: 1000000, inner_mem: 1207.898M, outer_mem: 2595.570M, main_rt: 1, global_rt: 300133
axum 11035, ct: 299000000, conn: 89830, max_conn: 1000000, inner_mem: 993.853M, outer_mem: 2626.445M, main_rt: 1, global_rt: 84386
axum 11036, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 712.665M, outer_mem: 2635.820M, main_rt: 1, global_rt: 8
axum 11037, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 670.051M, outer_mem: 2635.820M, main_rt: 1, global_rt: 8
axum 11038, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 670.051M, outer_mem: 2635.820M, main_rt: 1, global_rt: 8
axum 11039, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 670.051M, outer_mem: 2635.820M, main_rt: 1, global_rt: 8
axum 11040, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 670.051M, outer_mem: 2635.820M, main_rt: 1, global_rt: 8
axum 11041, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 670.051M, outer_mem: 2635.820M, main_rt: 1, global_rt: 8
axum 11042, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 2179.012M, main_rt: 1, global_rt: 0, refresh runtime and libc::malloc_trim(0) on DefaultAlloc when delta 300M! cost 111.044121ms
axum 11043, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 16.055M, main_rt: 1, global_rt: 0
axum 11044, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 16.055M, main_rt: 1, global_rt: 0
axum 11045, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 16.055M, main_rt: 1, global_rt: 0
axum 11046, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 16.055M, main_rt: 1, global_rt: 0
axum 11047, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 16.055M, main_rt: 1, global_rt: 0
axum 11048, ct: 299000000, conn: 0, max_conn: 1000000, inner_mem: 0.266M, outer_mem: 16.055M, main_rt: 1, global_rt: 0
zqlpaopao commented 3 weeks ago

Tokio does need to implement a memory recycling strategy or refresh the runtime, as Rust is more suitable for stable runtime and is needed in many scenarios. If I were an agent monitoring the server, my memory and CPU usage would be limited. Without proper recycling, slow growth could lead to the system killing my program