hyperium / hyper

An HTTP library for Rust
https://hyper.rs
MIT License
14.09k stars 1.55k forks source link

Client hang with hyper 0.14 (tokio, async-std) #2312

Open pandaman64 opened 3 years ago

pandaman64 commented 3 years ago

Context: we are investigating if upgrading hyper to 0.13 fixes #2306, and it seems not.

Steps to reproduce

Prerequisites:

  1. clone https://github.com/pandaman64/hyper-hang-tokio
  2. cargo run

Expected behavior

The program should run until the system resource is exhausted.

Actual behavior

It hangs after indeterminate iterations.

Log (last several iterations) ``` 435 Oct 27 16:48:35.474 DEBUG hyper::client::pool: reuse idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) Oct 27 16:48:35.474 DEBUG hyper::proto::h1::io: flushed 76 bytes Oct 27 16:48:35.475 DEBUG hyper::proto::h1::io: read 110 bytes Oct 27 16:48:35.475 DEBUG hyper::proto::h1::io: parsed 3 headers Oct 27 16:48:35.475 DEBUG hyper::proto::h1::conn: incoming body is empty Oct 27 16:48:35.476 DEBUG hyper::client::pool: reuse idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) Oct 27 16:48:35.477 DEBUG hyper::proto::h1::io: flushed 75 bytes Oct 27 16:48:35.477 DEBUG hyper::client::pool: pooling idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) Oct 27 16:48:35.478 DEBUG hyper::proto::h1::io: read 211 bytes Oct 27 16:48:35.478 DEBUG hyper::proto::h1::io: parsed 7 headers Oct 27 16:48:35.478 DEBUG hyper::proto::h1::conn: incoming body is chunked encoding 436 Oct 27 16:48:35.478 DEBUG hyper::client::pool: reuse idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) Oct 27 16:48:35.479 DEBUG hyper::proto::h1::io: flushed 76 bytes Oct 27 16:48:35.479 DEBUG hyper::proto::h1::io: read 110 bytes Oct 27 16:48:35.479 DEBUG hyper::proto::h1::io: parsed 3 headers Oct 27 16:48:35.479 DEBUG hyper::proto::h1::conn: incoming body is empty Oct 27 16:48:35.480 DEBUG hyper::client::pool: reuse idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) Oct 27 16:48:35.480 DEBUG hyper::client::pool: pooling idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) Oct 27 16:48:35.481 DEBUG hyper::proto::h1::io: flushed 75 bytes Oct 27 16:48:35.481 DEBUG hyper::proto::h1::io: read 211 bytes Oct 27 16:48:35.481 DEBUG hyper::proto::h1::io: parsed 7 headers Oct 27 16:48:35.481 DEBUG hyper::proto::h1::conn: incoming body is chunked encoding 437 Oct 27 16:48:35.482 DEBUG hyper::client::pool: reuse idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) Oct 27 16:48:35.482 DEBUG hyper::proto::h1::io: flushed 76 bytes Oct 27 16:48:35.482 DEBUG hyper::proto::h1::io: read 110 bytes Oct 27 16:48:35.482 DEBUG hyper::proto::h1::io: parsed 3 headers Oct 27 16:48:35.483 DEBUG hyper::proto::h1::conn: incoming body is empty Oct 27 16:48:35.483 DEBUG hyper::client::pool: reuse idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) Oct 27 16:48:35.484 DEBUG hyper::proto::h1::io: flushed 75 bytes Oct 27 16:48:35.484 DEBUG hyper::client::pool: pooling idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) Oct 27 16:48:35.484 DEBUG hyper::proto::h1::io: read 211 bytes Oct 27 16:48:35.484 DEBUG hyper::proto::h1::io: parsed 7 headers Oct 27 16:48:35.484 DEBUG hyper::proto::h1::conn: incoming body is chunked encoding Oct 27 16:48:35.484 DEBUG hyper::client::pool: pooling idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) 438 Oct 27 16:48:35.485 DEBUG hyper::client::pool: reuse idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) ```

Reproducer

I'm pasting the reproducer here for ease of reference.

use futures::prelude::*;
fn main() {
    // env_logger::Builder::from_default_env()
    //     .target(env_logger::fmt::Target::Stdout)
    //     .init();
    tracing_subscriber::fmt::init();

    let mut runtime = tokio::runtime::Builder::new()
        .threaded_scheduler()
        .enable_all()
        .build()
        .unwrap();
    let client: hyper::Client<hyperlocal::UnixConnector> =
        hyper::Client::builder().build(hyperlocal::UnixConnector);
    for i in 0.. {
        println!("{}", i);
        let res = runtime
            .block_on(async {
                let _resp = client
                    .get(hyperlocal::Uri::new("/var/run/docker.sock", "//events?").into())
                    .await;
                client
                    .get(hyperlocal::Uri::new("/var/run/docker.sock", "/events?").into())
                    .await
            })
            .unwrap();
        runtime.spawn(res.into_body().into_future());
    }
}

Notes

  1. Double slashes in the uri ("//events?") does matter. We need two requests with different uris.
  2. I suspect this is a hyper issue because we can reproduce the same hang both with tokio and async-std:
seanmonstar commented 3 years ago

Thanks for trying this out with the newer version. I wonder about using block_on so much, since the detection of an unusable connection happens in a spawned task that might not be getting run as much.

Have you seen the same issue if you change this to async/await?

#[tokio::main]
async fn main() {
    // env_logger::Builder::from_default_env()
    //     .target(env_logger::fmt::Target::Stdout)
    //     .init();
    tracing_subscriber::fmt::init();
    let client: hyper::Client<hyperlocal::UnixConnector> =
        hyper::Client::builder().build(hyperlocal::UnixConnector);
    for i in 0.. {
        println!("{}", i);
                let _resp = client
                    .get(hyperlocal::Uri::new("/var/run/docker.sock", "//events?").into())
                    .await;
                client
                    .get(hyperlocal::Uri::new("/var/run/docker.sock", "/events?").into())
                    .await;
    }
}
pandaman64 commented 3 years ago

Thank you for the response!

I tried the following version without block_on, still it hangs:

#[tokio::main]
async fn main() {
    // env_logger::Builder::from_default_env()
    //     .target(env_logger::fmt::Target::Stdout)
    //     .init();
    tracing_subscriber::fmt::init();

    let client: hyper::Client<hyperlocal::UnixConnector> =
        hyper::Client::builder().build(hyperlocal::UnixConnector);
    for i in 0.. {
        println!("{}", i);
        let _resp = client
            .get(hyperlocal::Uri::new("/var/run/docker.sock", "//events?").into())
            .await;
        let res = client
            .get(hyperlocal::Uri::new("/var/run/docker.sock", "/events?").into())
            .await
            .unwrap();

        tokio::spawn(res.into_body().into_future());
    }
}

The last tokio::spawn seems necessary for reproduction.

pandaman64 commented 3 years ago

The spawned future (res.into_body().into_future()) will receive bytes on docker operations (e.g. docker run --rm hello-world). And I find that the program resumes looping after running docker commands (and eventually hangs again).

Probably too many spawned tasks block .get() futures from resolving?

sfackler commented 3 years ago

I ran your latest example in a Ubuntu WSL2 install and it ran just fine until I eventually killed it at iteration ~15,000. Are you sure the problem is with hyper and not your Docker driver not responding for some reason?

pandaman64 commented 3 years ago

Hmm. My WSL2 Box (Ubuntu 20.04.1 LTS (Focal Fossa)) does reproduce the hang. In my environment, async-std version tends to hang earlier. Did you try that?

sfackler commented 3 years ago

I do see the hang with the async-std version after a couple hundred iterations, yeah.

pandaman64 commented 3 years ago

My coworker reported that the following version randomly stops working when invoking repeatedly:

use futures::prelude::*;
#[tokio::main]
async fn main() {
    let args: Vec<String> = std::env::args().collect();
    env_logger::init();
    let client: hyper::Client<hyperlocal::UnixConnector> =
        hyper::Client::builder().build(hyperlocal::UnixConnector);
    let _resp = client
        .get(hyperlocal::Uri::new("/var/run/docker.sock", "//events").into()) // this uri can be "//"
        .await;
    let resp = client
        .get(hyperlocal::Uri::new("/var/run/docker.sock", "/events").into())
        .await
        .unwrap();
    tokio::spawn(resp.into_body().into_future());
    let _resp = client
        .get(hyperlocal::Uri::new("/var/run/docker.sock", "//events").into()) // this uri can be "//", too
        .await;
    println!("ok: {}", args[1]);
}

I couldn't reproduce it with RUST_LOG=debug, though RUST_LOG=trace or RUST_LOG= do reproduce.

trace log of the last invocation ``` [2020-10-28T03:39:24Z TRACE want] signal: Want [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle } [2020-10-28T03:39:24Z TRACE want] poll_want: taker wants! [2020-10-28T03:39:24Z TRACE hyper::client::pool] pool closed, canceling idle interval [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE hyper::client::pool] pool dropped, dropping pooled (("unix", 2f7661722f72756e2f646f636b65722e736f636b:0)) [2020-10-28T03:39:24Z TRACE tokio::util::trace] -- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -- task [2020-10-28T03:39:24Z TRACE hyper::proto::h1::dispatch] client tx closed [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] State::close_read() [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] State::close_write() [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] flushed({role=client}): State { reading: Closed, writing: Closed, keep_alive: Disabled } [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] shut down IO complete [2020-10-28T03:39:24Z TRACE mio::poll] deregistering handle with poller [2020-10-28T03:39:24Z TRACE want] signal: Closed [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -- task [2020-10-28T03:39:24Z TRACE mio::poll] deregistering handle with poller [2020-10-28T03:39:24Z TRACE want] signal: Closed [2020-10-28T03:39:24Z TRACE want] signal found waiting giver, notifying [2020-10-28T03:39:24Z TRACE tokio::util::trace] -- task [2020-10-28T03:39:24Z TRACE hyper::client::pool] checkout waiting for idle connection: ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z TRACE mio::poll] registering with poller [2020-10-28T03:39:24Z TRACE hyper::client::conn] client handshake HTTP/1 [2020-10-28T03:39:24Z TRACE hyper::client] handshake complete, spawning background dispatcher task [2020-10-28T03:39:24Z TRACE tokio::util::trace] task; kind=task future=futures_util::future::future::Map, hyper::client::Client::connect_to::{{closure}}::{{closure}}::{{closure}}::{{closure}}>, hyper::client::Client::connect_to::{{closure}}::{{closure}}::{{closure}}::{{closure}}> [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE want] signal: Want [2020-10-28T03:39:24Z TRACE want] signal found waiting giver, notifying [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Busy } [2020-10-28T03:39:24Z TRACE want] poll_want: taker wants! [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE hyper::client::pool] checkout dropped for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] encode_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] -> encode_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] Client::encode method=GET, body=None [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] <- encode_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] -- encode_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::io] detected no usage of vectored write, flattening [2020-10-28T03:39:24Z DEBUG hyper::proto::h1::io] flushed 69 bytes [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] flushed({role=client}): State { reading: Init, writing: KeepAlive, keep_alive: Busy } [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] Conn::read_head [2020-10-28T03:39:24Z DEBUG hyper::proto::h1::io] read 103 bytes [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] parse_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] -> parse_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] Response.parse([Header; 100], [u8; 103]) [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] Response.parse Complete(103) [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] <- parse_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] -- parse_headers [2020-10-28T03:39:24Z DEBUG hyper::proto::h1::io] parsed 3 headers [2020-10-28T03:39:24Z DEBUG hyper::proto::h1::conn] incoming body is empty [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] maybe_notify; read_from_io blocked [2020-10-28T03:39:24Z TRACE tokio::util::trace] task; kind=task future=futures_util::future::future::Map::send_request::{{closure}}::{{closure}}::{{closure}}>, hyper::client::Client::send_request::{{closure}}::{{closure}}::{{closure}}> [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE want] signal: Want [2020-10-28T03:39:24Z TRACE hyper::client::pool] checkout waiting for idle connection: ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z TRACE want] signal found waiting giver, notifying [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle } [2020-10-28T03:39:24Z TRACE want] poll_want: taker wants! [2020-10-28T03:39:24Z TRACE hyper::client::pool] put; add idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z TRACE mio::poll] registering with poller [2020-10-28T03:39:24Z TRACE hyper::client::pool] put; found waiter for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z DEBUG hyper::client::pool] reuse idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] task; kind=task future=futures_util::future::future::Map::connect_to::{{closure}}, futures_util::future::either::Either, hyper::error::Error::new_connect>, futures_util::future::either::Either::{{closure}}>, futures_util::future::poll_fn::PollFn::when_ready::{{closure}}>, hyper::client::Client::connect_to::{{closure}}::{{closure}}::{{closure}}>, hyper::client::Client::connect_to::{{closure}}::{{closure}}::{{closure}}>>>, futures_util::future::ready::Ready>, hyper::error::Error>>>, hyper::client::Client::connect_to::{{closure}}::{{closure}}>, futures_util::future::ready::Ready>, hyper::error::Error>>>>, hyper::client::Client::connection_for::{{closure}}::{{closure}}>, hyper::client::Client::connection_for::{{closure}}::{{closure}}> [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE want] signal: Want [2020-10-28T03:39:24Z TRACE hyper::client::conn] client handshake HTTP/1 [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle } [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE hyper::client] handshake complete, spawning background dispatcher task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE tokio::util::trace] task; kind=task future=futures_util::future::future::Map, hyper::client::Client::connect_to::{{closure}}::{{closure}}::{{closure}}::{{closure}}>, hyper::client::Client::connect_to::{{closure}}::{{closure}}::{{closure}}::{{closure}}> [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE want] signal: Want [2020-10-28T03:39:24Z TRACE want] signal found waiting giver, notifying [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] encode_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Busy } [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] -> encode_headers [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE want] poll_want: taker wants! [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] Client::encode method=GET, body=None [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] <- encode_headers [2020-10-28T03:39:24Z TRACE hyper::client::pool] put; add idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] -- encode_headers [2020-10-28T03:39:24Z DEBUG hyper::client::pool] pooling idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z DEBUG hyper::proto::h1::io] flushed 74 bytes [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] flushed({role=client}): State { reading: Init, writing: KeepAlive, keep_alive: Busy } [2020-10-28T03:39:24Z TRACE tokio::util::trace] task; kind=task future=hyper::client::pool::IdleTask> [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] Conn::read_head [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE hyper::client::pool] idle interval checking for expired [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z DEBUG hyper::proto::h1::io] read 211 bytes [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] parse_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] -> parse_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] Response.parse([Header; 100], [u8; 211]) [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] Response.parse Complete(211) [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] <- parse_headers [2020-10-28T03:39:24Z TRACE hyper::proto::h1::role] -- parse_headers [2020-10-28T03:39:24Z DEBUG hyper::proto::h1::io] parsed 7 headers [2020-10-28T03:39:24Z DEBUG hyper::proto::h1::conn] incoming body is chunked encoding [2020-10-28T03:39:24Z TRACE hyper::proto::h1::decode] decode; state=Chunked(Size, 0) [2020-10-28T03:39:24Z TRACE hyper::proto::h1::decode] Read chunk hex size [2020-10-28T03:39:24Z TRACE hyper::client::pool] put; add idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z DEBUG hyper::client::pool] pooling idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z TRACE tokio::util::trace] task; kind=task future=futures_util::stream::stream::into_future::StreamFuture [2020-10-28T03:39:24Z TRACE tokio::util::trace] -> task [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task [2020-10-28T03:39:24Z TRACE hyper::client::pool] take? ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0): expiration = Some(90s) [2020-10-28T03:39:24Z TRACE hyper::proto::h1::conn] flushed({role=client}): State { reading: Body(Chunked(Size, 0)), writing: KeepAlive, keep_alive: Busy } [2020-10-28T03:39:24Z DEBUG hyper::client::pool] reuse idle connection for ("unix", 2f7661722f72756e2f646f636b65722e736f636b:0) [2020-10-28T03:39:24Z TRACE tokio::util::trace] <- task ```
pandaman64 commented 3 years ago

Another coworker reported that adding

    .pool_idle_timeout(std::time::Duration::from_millis(0))
    .pool_max_idle_per_host(0)

to the client builder is a work around.

pandaman64 commented 3 years ago

Update:

univerz commented 2 years ago
  * I suspect there is something wrong with [the pool implementation](https://github.com/hyperium/hyper/blob/42587059e6175735b1a8656c5ddbff0edb19294c/src/client/pool.rs), but I couldn't find it.

there is TODO: unhack inside Pool::reuse, but hard for me to say how serious it is.

from my debugging it looks like there is no initial Connection poll (& then through ProtoClient -> Dispatcher -> Conn request is not encoded), but i have surprisingly hard time to figure out where that poll should be initiated to investigate it further.

univerz commented 2 years ago

i have added id field to Connection & SendRequestand a few tracing events. in the log below, there is the end of previous successfull "round" and the last unsuccessfull round separated by *****. between them there is some spawning, handles, threads etc.

send_request_retryable{id=1}: hyper::client::conn: before rx si emited after dispatch.try_send & before rx.then, hyper::client::conn: after rx=1 inside it if successfull. poll{id=1}: hyper::client::conn: called indicates Connection::poll. there are several more inside PoolInner::spawn_idle_interval & Pool::reuse, but it looks like execution paths are always the same.

when the problem arises, it's always with the last dropped/pooled Connection from previous round, in this case id=2

any pointers where to look @seanmonstar ?

2021-12-03T10:36:33.319828Z DEBUG hyper::client::pool: reuse idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.319833Z  WARN hyper::client::pool: can't share
2021-12-03T10:36:33.319836Z  WARN hyper::client::pool: downgrade
2021-12-03T10:36:33.319845Z  WARN send_request_retryable{id=2}: hyper::client::conn: before rx
2021-12-03T10:36:33.319859Z DEBUG hyper::client::pool: reuse idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.319863Z  WARN hyper::client::pool: can't share
2021-12-03T10:36:33.319865Z  WARN hyper::client::pool: downgrade
2021-12-03T10:36:33.319874Z  WARN send_request_retryable{id=3}: hyper::client::conn: before rx
2021-12-03T10:36:33.319883Z  WARN poll{id=2}: hyper::client::conn: called
2021-12-03T10:36:33.319893Z DEBUG hyper::client::pool: reuse idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.319898Z  WARN hyper::client::pool: can't share
2021-12-03T10:36:33.319900Z  WARN hyper::client::pool: downgrade
2021-12-03T10:36:33.319904Z DEBUG poll{id=2}: hyper::proto::h1::io: flushed 183 bytes
2021-12-03T10:36:33.319909Z  WARN send_request_retryable{id=0}: hyper::client::conn: before rx
2021-12-03T10:36:33.319913Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T10:36:33.320836Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:36:33.320841Z DEBUG poll{id=3}: hyper::proto::h1::io: flushed 182 bytes
2021-12-03T10:36:33.320861Z DEBUG poll{id=0}: hyper::proto::h1::io: flushed 219 bytes
2021-12-03T10:36:33.321024Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T10:36:33.321047Z DEBUG poll{id=3}: hyper::proto::h1::io: parsed 5 headers
2021-12-03T10:36:33.321053Z DEBUG poll{id=3}: hyper::proto::h1::conn: incoming body is content-length (66 bytes)
2021-12-03T10:36:33.321060Z DEBUG poll{id=3}: hyper::proto::h1::conn: incoming body completed
2021-12-03T10:36:33.321076Z  WARN hyper::client::conn: after rx=3
2021-12-03T10:36:33.321083Z DEBUG drop: hyper::client::pool: pooling idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.321088Z  WARN drop:spawn_idle_interval: hyper::client::pool: idle_interval_ref is_some
2021-12-03T10:36:33.321094Z DEBUG reqwest::async_impl::client: response '200 OK' for http://localhost:8545/    
2021-12-03T10:36:33.321148Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T10:36:33.322250Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:36:33.322262Z DEBUG poll{id=0}: hyper::proto::h1::io: parsed 5 headers
2021-12-03T10:36:33.322266Z DEBUG poll{id=0}: hyper::proto::h1::conn: incoming body is chunked encoding
2021-12-03T10:36:33.322272Z DEBUG poll{id=0}: hyper::proto::h1::decode: incoming chunked header: 0x800 (2048 bytes)
2021-12-03T10:36:33.322281Z  WARN hyper::client::conn: after rx=0
2021-12-03T10:36:33.322289Z DEBUG reqwest::async_impl::client: response '200 OK' for http://localhost:8545/    
2021-12-03T10:36:33.322357Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:36:33.322362Z DEBUG poll{id=0}: hyper::proto::h1::decode: incoming chunked header: 0x800 (2048 bytes)
2021-12-03T10:36:33.322391Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:36:33.322415Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:36:33.322421Z DEBUG poll{id=0}: hyper::proto::h1::decode: incoming chunked header: 0x723 (1827 bytes)
2021-12-03T10:36:33.322534Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:36:33.322541Z DEBUG poll{id=0}: hyper::proto::h1::conn: incoming body completed
2021-12-03T10:36:33.322551Z DEBUG drop: hyper::client::pool: pooling idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.322557Z  WARN drop:spawn_idle_interval: hyper::client::pool: idle_interval_ref is_some
2021-12-03T10:36:33.323183Z  WARN poll{id=2}: hyper::client::conn: called
2021-12-03T10:36:33.323206Z DEBUG poll{id=2}: hyper::proto::h1::io: parsed 5 headers
2021-12-03T10:36:33.323212Z DEBUG poll{id=2}: hyper::proto::h1::conn: incoming body is content-length (75 bytes)
2021-12-03T10:36:33.323218Z DEBUG poll{id=2}: hyper::proto::h1::conn: incoming body completed
2021-12-03T10:36:33.323229Z  WARN hyper::client::conn: after rx=2
2021-12-03T10:36:33.323236Z DEBUG drop: hyper::client::pool: pooling idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.323241Z  WARN drop:spawn_idle_interval: hyper::client::pool: idle_interval_ref is_some
2021-12-03T10:36:33.323247Z DEBUG reqwest::async_impl::client: response '200 OK' for http://localhost:8545/    

************* setup_evm *************

2021-12-03T10:36:33.323408Z DEBUG hyper::client::pool: reuse idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.323419Z  WARN hyper::client::pool: can't share
2021-12-03T10:36:33.323422Z  WARN hyper::client::pool: downgrade
2021-12-03T10:36:33.323433Z  WARN send_request_retryable{id=2}: hyper::client::conn: before rx
2021-12-03T10:36:33.323449Z DEBUG hyper::client::pool: reuse idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.323454Z  WARN hyper::client::pool: can't share
2021-12-03T10:36:33.323456Z  WARN hyper::client::pool: downgrade
2021-12-03T10:36:33.323465Z  WARN send_request_retryable{id=0}: hyper::client::conn: before rx
2021-12-03T10:36:33.323481Z DEBUG hyper::client::pool: reuse idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.323486Z  WARN hyper::client::pool: can't share
2021-12-03T10:36:33.323489Z  WARN hyper::client::pool: downgrade
2021-12-03T10:36:33.323495Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:36:33.323509Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T10:36:33.323521Z DEBUG poll{id=0}: hyper::proto::h1::io: flushed 262 bytes
2021-12-03T10:36:33.323499Z  WARN send_request_retryable{id=3}: hyper::client::conn: before rx
2021-12-03T10:36:33.323530Z DEBUG poll{id=3}: hyper::proto::h1::io: flushed 250 bytes
2021-12-03T10:36:33.323728Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:36:33.323747Z DEBUG poll{id=0}: hyper::proto::h1::io: parsed 5 headers
2021-12-03T10:36:33.323752Z DEBUG poll{id=0}: hyper::proto::h1::conn: incoming body is content-length (66 bytes)
2021-12-03T10:36:33.323761Z DEBUG poll{id=0}: hyper::proto::h1::conn: incoming body completed
2021-12-03T10:36:33.323766Z  WARN hyper::client::conn: after rx=0
2021-12-03T10:36:33.323776Z DEBUG drop: hyper::client::pool: pooling idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.323783Z  WARN drop:spawn_idle_interval: hyper::client::pool: idle_interval_ref is_some
2021-12-03T10:36:33.323789Z DEBUG reqwest::async_impl::client: response '200 OK' for http://localhost:8545/    
2021-12-03T10:36:33.323839Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:36:33.323867Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T10:36:33.323881Z DEBUG poll{id=3}: hyper::proto::h1::io: parsed 5 headers
2021-12-03T10:36:33.323885Z DEBUG poll{id=3}: hyper::proto::h1::conn: incoming body is content-length (1167 bytes)
2021-12-03T10:36:33.323893Z DEBUG poll{id=3}: hyper::proto::h1::conn: incoming body completed
2021-12-03T10:36:33.323896Z  WARN hyper::client::conn: after rx=3
2021-12-03T10:36:33.323903Z DEBUG drop: hyper::client::pool: pooling idle connection for ("http", localhost:8545)
2021-12-03T10:36:33.323908Z  WARN drop:spawn_idle_interval: hyper::client::pool: idle_interval_ref is_some
2021-12-03T10:36:33.323914Z DEBUG reqwest::async_impl::client: response '200 OK' for http://localhost:8545/    
2021-12-03T10:36:33.323964Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T10:38:02.859635Z  WARN hyper::client::pool: IdleTask::poll
2021-12-03T10:38:02.894832Z  WARN hyper::client::pool: IdleTask::poll
2021-12-03T10:38:33.182905Z  WARN poll{id=1}: hyper::client::conn: called
2021-12-03T10:38:33.182958Z  WARN poll{id=1}: hyper::client::conn: shutdown
2021-12-03T10:38:33.189008Z  WARN poll{id=4}: hyper::client::conn: called
2021-12-03T10:38:33.189042Z  WARN poll{id=4}: hyper::client::conn: shutdown
2021-12-03T10:38:33.324401Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T10:38:33.324443Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T10:38:33.324463Z  WARN poll{id=3}: hyper::client::conn: shutdown
2021-12-03T10:38:33.324475Z  WARN poll{id=0}: hyper::client::conn: shutdown
univerz commented 2 years ago

ah, there is some difference. previous round before every sucessfull ends with chunked encoding response & that accidentaly prevents reusing connection. notice that reqwest is returning before chunked body is completed, but on client side the body is consumed:

let res = self.client.post(self.url.as_ref()).json(&payload).send().await?;
let text = res.text().await?;

every round is spawned inside stream, but debugged with .buffer_unordered(1); => processing is serial.

2021-12-03T11:18:55.002783Z DEBUG poll{id=3}: hyper::proto::h1::conn: incoming body is chunked encoding
2021-12-03T11:18:55.002789Z DEBUG reqwest::async_impl::client: returning poll::ready    
2021-12-03T11:18:55.002795Z DEBUG poll{id=3}: hyper::proto::h1::decode: incoming chunked header: 0x800 (2048 bytes)
2021-12-03T11:18:55.002848Z  WARN hyper::client::conn: after rx=3
2021-12-03T11:18:55.002858Z DEBUG reqwest::async_impl::client: response '200 OK' for http://localhost:8545/
2021-12-03T11:18:55.002862Z DEBUG reqwest::async_impl::client: returning poll::ready
2021-12-03T11:18:55.002864Z  WARN poll{id=2}: hyper::client::conn: called
2021-12-03T11:18:55.002904Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T11:18:55.002910Z DEBUG poll{id=3}: hyper::proto::h1::decode: incoming chunked header: 0x800 (2048 bytes)
2021-12-03T11:18:55.002938Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T11:18:55.002943Z DEBUG poll{id=3}: hyper::proto::h1::decode: incoming chunked header: 0x2BC (700 bytes)

************* setup_evm *************

2021-12-03T11:18:55.003152Z  WARN reqwest::async_impl::request: returning from reqwest::send()
2021-12-03T11:18:55.003169Z DEBUG hyper::client::pool: reuse idle connection for ("http", localhost:8545)
2021-12-03T11:18:55.003173Z  WARN hyper::client::pool: can't share
2021-12-03T11:18:55.003176Z  WARN hyper::client::pool: downgrade
2021-12-03T11:18:55.003188Z  WARN send_request_retryable{id=2}: hyper::client::conn: before rx
2021-12-03T11:18:55.003209Z  WARN reqwest::async_impl::request: returning from reqwest::send()
2021-12-03T11:18:55.003216Z DEBUG hyper::client::pool: reuse idle connection for ("http", localhost:8545)
2021-12-03T11:18:55.003219Z  WARN hyper::client::pool: can't share
2021-12-03T11:18:55.003217Z  WARN poll{id=2}: hyper::client::conn: called
2021-12-03T11:18:55.003223Z  WARN hyper::client::pool: downgrade
2021-12-03T11:18:55.003241Z  WARN send_request_retryable{id=0}: hyper::client::conn: before rx
2021-12-03T11:18:55.003248Z DEBUG poll{id=2}: hyper::proto::h1::io: flushed 253 bytes
2021-12-03T11:18:55.003250Z  WARN poll{id=0}: hyper::client::conn: called
2021-12-03T11:18:55.003257Z  WARN reqwest::async_impl::request: returning from reqwest::send()
2021-12-03T11:18:55.003267Z DEBUG hyper::client::pool: reuse idle connection for ("http", localhost:8545)
2021-12-03T11:18:55.003270Z  WARN hyper::client::pool: can't share
2021-12-03T11:18:55.003273Z  WARN hyper::client::pool: downgrade
2021-12-03T11:18:55.003278Z DEBUG poll{id=0}: hyper::proto::h1::io: flushed 262 bytes
2021-12-03T11:18:55.003281Z  WARN send_request_retryable{id=1}: hyper::client::conn: before rx
...
2021-12-03T11:18:55.008947Z  WARN poll{id=3}: hyper::client::conn: called
2021-12-03T11:18:55.008952Z DEBUG poll{id=3}: hyper::proto::h1::conn: incoming body completed
univerz commented 2 years ago

when comparing problematic content-length body request on trace debug level, the hang occurs if it is last completed request & round ends with

2021-12-03T12:00:52.385383Z  WARN poll{id=1}: hyper::client::conn: called
2021-12-03T12:00:52.385391Z TRACE poll{id=1}: hyper::proto::h1::conn: Conn::read_head
2021-12-03T12:00:52.385406Z TRACE poll{id=1}: hyper::proto::h1::io: received 222 bytes
2021-12-03T12:00:52.385412Z TRACE poll{id=1}:parse_headers: hyper::proto::h1::role: Response.parse bytes=222
2021-12-03T12:00:52.385418Z TRACE poll{id=1}:parse_headers: hyper::proto::h1::role: Response.parse Complete(146)
2021-12-03T12:00:52.385427Z DEBUG poll{id=1}: hyper::proto::h1::io: parsed 5 headers
2021-12-03T12:00:52.385431Z DEBUG poll{id=1}: hyper::proto::h1::conn: incoming body is content-length (76 bytes)
2021-12-03T12:00:52.385437Z TRACE poll{id=1}: hyper::proto::h1::decode: decode; state=Length(76)
2021-12-03T12:00:52.385442Z DEBUG poll{id=1}: hyper::proto::h1::conn: incoming body completed
2021-12-03T12:00:52.385447Z TRACE poll{id=1}: hyper::proto::h1::conn: maybe_notify; read_from_io blocked
2021-12-03T12:00:52.385453Z TRACE poll{id=1}: want: signal: Want    
2021-12-03T12:00:52.385457Z TRACE poll{id=1}: hyper::proto::h1::conn: flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle }
2021-12-03T12:00:52.385462Z TRACE poll{id=1}: want: signal: Want    
2021-12-03T12:00:52.385465Z TRACE poll{id=1}: hyper::proto::h1::conn: flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle }
2021-12-03T12:00:52.385473Z  WARN rpc{method="eth_gasPrice" params="null"}: hyper::client::conn: after rx=1
2021-12-03T12:00:52.385480Z TRACE rpc{method="eth_gasPrice" params="null"}:drop: hyper::client::pool: put; add idle connection for ("http", localhost:8545)
2021-12-03T12:00:52.385487Z DEBUG rpc{method="eth_gasPrice" params="null"}:drop: hyper::client::pool: pooling idle connection for ("http", localhost:8545)
2021-12-03T12:00:52.385492Z  WARN rpc{method="eth_gasPrice" params="null"}:drop:spawn_idle_interval: hyper::client::pool: idle_interval_ref is_some
2021-12-03T12:00:52.385501Z DEBUG rpc{method="eth_gasPrice" params="null"}: reqwest::async_impl::client: response '200 OK' for http://localhost:8545/    
2021-12-03T12:00:52.385507Z DEBUG rpc{method="eth_gasPrice" params="null"}: reqwest::async_impl::client: returning poll::ready    
2021-12-03T12:00:52.385546Z TRACE rpc{method="eth_gasPrice" params="null"}: tokio_util::codec::framed_impl: attempting to decode a frame    
2021-12-03T12:00:52.385551Z TRACE rpc{method="eth_gasPrice" params="null"}: tokio_util::codec::framed_impl: frame decoded from buffer    
2021-12-03T12:00:52.385555Z TRACE rpc{method="eth_gasPrice" params="null"}: tokio_util::codec::framed_impl: attempting to decode a frame    
2021-12-03T12:00:52.385569Z TRACE rpc{method="eth_gasPrice" params="null"}: ethers_providers::provider: rx="\"0x183f284dd0\""

but in successfull case (when it is not the last completed request in round) there is this last poll later with third flush (notice that it happend after pooling idle connection)

2021-12-03T12:00:52.385580Z  WARN poll{id=1}: hyper::client::conn: called
2021-12-03T12:00:52.385585Z TRACE poll{id=1}: want: signal: Want    
2021-12-03T12:00:52.385589Z TRACE poll{id=1}: hyper::proto::h1::conn: flushed({role=client}): State { reading: Init, writing: Init, keep_alive: Idle }

it looks to me this should be close to enough info to fix this for someone with deep understanding of internals. happy to investigate further.

threema-danilo commented 2 years ago

I'm experiencing a similar issue.

In an application I'm working on, I'm sending multiple requests (from a shared reqwest Client instance) from a future (which is launched through handle.block_on(...) within a worker thread). The first three HTTP requests reproducibly succeed, while the fourth request to the same host reproducibly fails.

According to a Wireshark trace, the fourth request is never sent to the network. I added debug logging deep down into hyper, but didn't really find the source of the issue. I suspected that it's related to connection reuse (because the first three requests work), and just as described above, setting .pool_max_idle_per_host(0) on the client makes it work again.

In case it's useful, here are a few screenshots from my debugging session. The reqwest library delegates the HTTP request to hyper:

image

When looking at the program using tokio-console, I noticed that the task 72 (with Hyper's Exec::execute method) is reported as having lost its waker. I don't know if this could be relevant.

image

image

jobot0 commented 2 years ago

Hello, is there any news about this one?

arthurprs commented 1 year ago

We've been hit by https://github.com/hyperium/hyper/issues/2312 yesterday (and I think once before) and tracked it down to this. The example program from https://github.com/hyperium/hyper/issues/2312 can easily show the problem.

We're using pool_max_idle_per_host(0) while no better solution exists :sob:

kpark-hrp commented 1 year ago

I just wasted my entire day because of this. This also affects reqwest obviously because it implements hyper underneath. How I encountered this bug...

    std::thread::spawn(move || {
        let a = tokio::runtime::Builder::new_multi_thread().build().unwrap();
        a.block_on(
            async {
                // reqwest call here then `.await`
            }
        );
    }).join().expect("Thread panicked")

But, .pool_max_idle_per_host(0) did the trick of workaround

numberjuani commented 10 months ago

is there any timeline for fixing this or have we given up?

GunnarMorrigan commented 9 months ago

I have a two step container build process to reduce container size. In the initial build container everything works fine. However, In my second container it breaks with the same behaviour as described here.

Could this be the same issue or is it a docker driver issue that @sfackler described above? It does the same under docker and podman..

2023-09-20T18:00:00.716743Z TRACE hyper::client::pool: 638: checkout waiting for idle connection: ("https", _URL_HERE_)
2023-09-20T18:00:00.716810Z DEBUG reqwest::connect: 429: starting new connection: https://_URL_HERE_/    
2023-09-20T18:00:00.716878Z TRACE hyper::client::connect::http: 278: Http::connect; scheme=Some("https"), host=Some("_URL_HERE_"), port=None
2023-09-20T18:00:00.717060Z DEBUG hyper::client::connect::dns: 122: resolving host="_URL_HERE_"
2023-09-20T18:00:00.905903Z DEBUG hyper::client::connect::http: 537: connecting to IP:443
2023-09-20T18:00:00.909242Z DEBUG hyper::client::connect::http: 540: connected to IP:443
2023-09-20T18:00:00.993186Z TRACE hyper::client::pool: 680: checkout dropped for ("https", _URL_HERE_)
2023-09-20T18:00:00.993237Z DEBUG azure_core::policies::retry_policies::retry_policy: 114: io error occurred when making request which will be retried: failed to execute `reqwest` request    
rami3l commented 3 weeks ago

@jevolk Just being curious, I saw you pushed a commit in a fork (https://github.com/jevolk/hyper/commit/56a64cf0bae4118d5b276fbb0eeb68dc286db38b) that seemingly to have "[fixed] the deadlock". Would you mind shedding a light on this issue?

jevolk commented 2 weeks ago

@jevolk Just being curious, I saw you pushed a commit in a fork (jevolk@56a64cf) that seemingly to have "[fixed] the deadlock". Would you mind shedding a light on this issue?

The patch appears to prevent the deadlock for the test https://github.com/pandaman64/hyper-hang-tokio but it is not the best solution. I've observed some regressions when using it under normal circumstances which is why I never opened a PR. I left it for future reference.

I should note my application makes an arguably exotic use of reqwest and hyper (legacy) client pools by communicating with thousands of remote hosts in a distributed system, often with several connections each, in an async+multi-threaded process: we do not set pool_max_idle_per_host=0. I have never hit this deadlock and no users have ever reported it. Thus I have not revisited this issue in some time now...