tokio-rs / axum

Ergonomic and modular web framework built with Tokio, Tower, and Hyper
19.2k stars 1.06k forks source link

Memory leak #1724

Closed lcmgh closed 1 year ago

lcmgh commented 1 year ago

I spotted a memory leak in my Kubernetes hosted application. Please note that Kubernetes shows the real occupied memory of a program (total memory - available memory (not free)).

  1. Changing the memory allocator to Microsoft's Mimalloc made the app decrease memory sometimes while it is still increasing over time
  2. Base image: Google Distroless (Debian)

So I tried to create a minimal reproducable example on my local machine and at least the Mac OS Activity monitor shows an increase of memory even when all requests stopped.

Reproduce

Prerequisities:

  1. Run program and measure memory usage

I am currently measuring with the MacOS activity monitor as my valgrind is not working. With valgrind you might want to try:

valgrind --tool=massif --time-unit=ms ./target/release/axum-leak
ms_print massif.out.765412 >> massif_pretty

or simply start program and measure with other programs.

initial memory usage before any requests were executed

image

  1. Create load

oha -z 10m -c 500 "http://127.0.0.1:3000/users?username=test"

I ran it for just some seconds. After stopping the memory usage remains on the higher level:

image

Code

Cargo.toml

[package]
name = "axum-leak"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
axum = "0.6.4"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0.68"
tokio = { version = "1.0", features = ["full"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

main.rs

use axum::{
    routing::{get, post},
    http::StatusCode,
    response::IntoResponse,
    Json, Router, extract::Query,
};
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;

#[tokio::main]
async fn main() {
    // initialize tracing
    tracing_subscriber::fmt::init();

    // build our application with a route
    let app = Router::new()
        // `GET /` goes to `root`
        .route("/", get(root))
        // `POST /users` goes to `create_user`
        .route("/users", get(create_user));

    // run our app with hyper
    // `axum::Server` is a re-export of `hyper::Server`
    let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
    tracing::debug!("listening on {}", addr);
    axum::Server::bind(&addr)
        .serve(app.into_make_service())
        .await
        .unwrap();
}

// basic handler that responds with a static string
async fn root() -> &'static str {
    "Hello, World!"
}

async fn create_user(
    // this argument tells axum to parse the request body
    // as JSON into a `CreateUser` type
    payload: Query<CreateUser>,
) -> impl IntoResponse {
    // insert your application logic here
    let user = User {
        id: 1337,
        username: payload.username.clone(),
    };

    // this will be converted into a JSON response
    // with a status code of `201 Created`
    (StatusCode::CREATED, Json(user))
}

// the input to our `create_user` handler
#[derive(Deserialize)]
struct CreateUser {
    username: String,
}

// the output to our `create_user` handler
#[derive(Serialize)]
struct User {
    id: u64,
    username: String,
}
lcmgh commented 1 year ago

The followng memory measurement cmd also confirms the leak.

Before load test:

% ps x -o rss,vsz,command | awk 'NR>1 {$1=int($1/1024)"M"; $2=int($2/1024)"M";}{ print ;}' | grep axum-leak
2M 398459M ./target/release/axum-leak

After 5sec load test (after load test stopped):

% ps x -o rss,vsz,command | awk 'NR>1 {$1=int($1/1024)"M"; $2=int($2/1024)"M";}{ print ;}' | grep axum-leak
20M 398622M ./target/release/axum-leak
davidpdrsn commented 1 year ago

Are you able to reproduce it using just hyper and not axum? If there is a leak I doubt it's axum's fault.

jplatte commented 1 year ago

Higher memory usage than before does not mean there is a leak. If memory usage doesn't continually rise when repeating the load test, it seems likely there's just a few more / larger buffers afterwards.

lcmgh commented 1 year ago

Puh, @jplatte when are the buffers expected to be cleared in normal circumstances? After my lunch break of 1h the higher memory level still remains.

If memory usage doesn't continually rise when repeating the load test, it seems likely there's just a few more / larger buffers afterwards.

After running the load test one more time for about 15sec memory usage climbed from 22mb to 37mb.

This chart is from my production workload (which has some more dependencies such as axum-sessions, sqlx..). From 25th on we started using Mimalloc.

image

jplatte commented 1 year ago

Okay, that does look like a memory leak. But I agree with david that it's much more likely to be a lower-level crate that leaks.

lcmgh commented 1 year ago

Are you able to reproduce it using just hyper and not axum? If there is a leak I doubt it's axum's fault.

What would be an equivalent hyper setup for the Axum code I posted above? I can then try to reproduce it there.

davidpdrsn commented 1 year ago

That is up for the allocator to decide. It is not efficient to always immediately give the memory back to the OS.

lcmgh commented 1 year ago

Added Mimalloc to the game. Memory immediately climbs to around 38m and stays around that level.

38M 398725M ./target/release/axum-leak

Stopped the load test and let it ran for 20s.

Memory remains at around 38M.

Waited 1 min. Memory still around that level (37M).

lcmgh commented 1 year ago

I just tried to reproduce the issue by spawning Axum with http2 only but seems I am doing something wrong.

axum = { version = "0.6.4", features = ["http2", "query", "json", "tokio"], default-features = false }

In that case every request fails, browser shows ERR_INVALID_HTTP_RESPONSE.

Edit: Adding "http1" as feature makes request work again. Will report back shortly.

Edit 2: Result is the same with http2.

oha -z 10m -c 1000 --http-version 2 "http://127.0.0.1:3000/users?username=test"

Memory remains at 40M.

lcmgh commented 1 year ago

I minimized the example and removed some default dependencies and removed the handler's input parameter. Problem still occurs

axum = { version = "0.6.4", features = ["http1", "tokio"], default-features = false }

use axum::{
    routing::{get},
    http::StatusCode,
    response::IntoResponse,
    Router
};
use serde::{Deserialize, Serialize};
use std::net::SocketAddr;

#[tokio::main]
async fn main() {
    // initialize tracing
    tracing_subscriber::fmt::init();

    // build our application with a route
    let app = Router::new()
        // `GET /` goes to `root`
        .route("/", get(root))
        // `POST /users` goes to `create_user`
        .route("/users", get(create_user));

    // run our app with hyper
    // `axum::Server` is a re-export of `hyper::Server`
    let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
    tracing::debug!("listening on {}", addr);
    axum::Server::bind(&addr)
        .serve(app.into_make_service())
        .await
        .unwrap();
}

// basic handler that responds with a static string
async fn root() -> &'static str {
    "Hello, World!"
}

async fn create_user(
    // this argument tells axum to parse the request body
    // as JSON into a `CreateUser` type
) -> impl IntoResponse {
    // insert your application logic here
    let user = User {
        id: 1337,
        username: "username".to_string(),
    };

    // this will be converted into a JSON response
    // with a status code of `201 Created`
    (StatusCode::CREATED, user.username)
}

// the output to our `create_user` handler
#[derive(Serialize, Deserialize)]
struct User {
    id: u64,
    username: String,
}
lcmgh commented 1 year ago

Looks like an hyper issue https://github.com/hyperium/hyper/issues/3130

davidpdrsn commented 1 year ago

I'll close this issue for now then.

hermesonbf commented 1 year ago

I'm having a memory leak problem with axum, and to make it worse, I put the server in production with thousands of users accessing it daily, only to find that the server freezes and doesn't respond after about 24 hours running because of the memory leak.

This is the minimal code that I was able to reproduce the problem:

use axum::{Router, Server};
use std::net::{Ipv6Addr, SocketAddr};

#[tokio::main]
async fn main() {
    Server::bind(&SocketAddr::from((Ipv6Addr::LOCALHOST, 1024))).serve(Router::new().fallback(example).into_make_service()).await.unwrap();
}

async fn example() -> String {
    return String::from("REQUEST");
}

These are the contents of the cargo.toml file:

[package]
name = "example"
version = "1.0.0"
edition = "2021"

[dependencies]
axum = "0.6.12"
tokio = { version = "1.27.0", features = ["rt-multi-thread", "macros"] }

After running the code above, the process will consume about 1 MB of memory, now run the following command:

ab -n 20000 -c 30 http://localhost:1024/

Now the process will consume almost 4 MB of memory, even after you stop the command above (try a couple times to increase even more), also, when the process is using about 4 MB of memory it simply stops responding to new requests, the command curl localhost:1024 freezes.

How to solve this?

davidpdrsn commented 1 year ago

Please see the suggestions in this discussions. It's unlike it's actually axum that's causing the leak. Use a tool to verify there actually is a leak.

hermesonbf commented 1 year ago

I don't care about the possible leak anymore, the main problem I'm facing is the fact that the process is actually freezing after too many requests, and I provided a minimal example to reproduce the problem, it should work fine. That's really weird.

hermesonbf commented 1 year ago

This video shows how the freezing occurs, I used the same code that I provided above.

https://user-images.githubusercontent.com/42178454/230647402-eb0877f6-bc54-4b36-96d5-f434a0aa64e4.mp4

davidpdrsn commented 1 year ago

That's strange indeed. I'm not sure. Can you replicate it without axum and instead using hyper directly?

lcmgh commented 1 year ago

This video shows how the freezing occurs, I used the same code that I provided above.

I cannot reproduce it with another load testing tool and rustc 1.68.0 on mac os ventura.

oha -z 1m -c 500 --ipv6 "http://localhost:1024/"

image

hermesonbf commented 1 year ago

I couldn’t figure out what was happening so I just rewrote my code in actix, for some reason the freezing has disappeared and now my server is working fine, it’s strange but at least now it’s working, I will use axum again for future projects and see if the glitch comes back.