hyperium / hyper

An HTTP library for Rust
https://hyper.rs
MIT License
14.58k stars 1.6k forks source link

http2 GOAWAY causes client to drop requests #2500

Open Skepfyr opened 3 years ago

Skepfyr commented 3 years ago

It's possible this is an h2 issue but I don't know the crates well enough to know.

If a GOAWAY is received while multiple requests are in flight some of them fail with the error:

http2 error: protocol error: not a result of an error

I can reproduce this by running the following code:

use anyhow::Result;
use hyper::{client::HttpConnector, Body, Client, Uri};
use tracing::{span, warn, Instrument as _, Level, Span};

#[tokio::main]
async fn main() -> Result<()> {
    tracing_subscriber::fmt()
        .with_max_level(Level::DEBUG)
        .init();

    let connector = HttpConnector::new();
    let client: Client<_, Body> = Client::builder().http2_only(true).build(connector);
    let uri: Uri = "http://127.0.0.1:8080".parse().unwrap();

    let mut requests = Vec::new();
    for i in 0u32..2 {
        let client = client.clone();
        let uri = uri.clone();
        let span = span!(Level::INFO, "request", i);
        span.follows_from(Span::current());
        requests.push(tokio::spawn(
            async move {
                match client.get(uri).await {
                    Ok(res) => {
                        let _ = hyper::body::to_bytes(res.into_body()).await.unwrap();
                    }
                    Err(e) => warn!("{}", e),
                }
            }
            .instrument(span),
        ));
    }
    for request in requests {
        let _ = request.await;
    }
    Ok(())
}

I ran this against an nginx instance with the following config to force http2 and send a GOAWAY after every single request.

events {}
http {
    server {
        listen 80 http2;
        keepalive_requests 1;

        location / {
            root /data;
        }
    }
}

This program generates the logs (edit: I've added in the trace level logs too if they're useful trace.log):

Apr 12 15:44:42.887 DEBUG request{i=0}: hyper::client::connect::http: connecting to 127.0.0.1:8080
Apr 12 15:44:42.888 DEBUG request{i=0}: hyper::client::connect::http: connected to 127.0.0.1:8080
Apr 12 15:44:42.888 DEBUG request{i=0}: h2::client: binding client connection
Apr 12 15:44:42.889 DEBUG request{i=0}: h2::client: client connection bound
Apr 12 15:44:42.889 DEBUG request{i=0}: h2::codec::framed_write: send frame=Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
Apr 12 15:44:42.889 DEBUG request{i=0}: hyper::client::pool: pooling idle connection for ("http", 127.0.0.1:8080)
Apr 12 15:44:42.889 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_read: received frame=Settings { flags: (0x0), max_concurrent_streams: 128, initial_window_size: 65536, max_frame_size: 16777215 }
Apr 12 15:44:42.890 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_write: send frame=Settings { flags: (0x1: ACK) }
Apr 12 15:44:42.890 DEBUG request{i=1}: hyper::client::pool: reuse idle connection for ("http", 127.0.0.1:8080)
Apr 12 15:44:42.890 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_read: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 2147418112 }
Apr 12 15:44:42.891 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_write: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5177345 }
Apr 12 15:44:42.891 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) }
Apr 12 15:44:42.891 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(3), flags: (0x5: END_HEADERS | END_STREAM) }
Apr 12 15:44:42.892 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_read: received frame=Settings { flags: (0x1: ACK) }
Apr 12 15:44:42.892 DEBUG request{i=0}:Connection{peer=Client}: h2::proto::settings: received settings ACK; applying Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
Apr 12 15:44:42.893 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_read: received frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(1) }
Apr 12 15:44:42.894 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
Apr 12 15:44:42.894 DEBUG request{i=0}:Connection{peer=Client}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
Apr 12 15:44:42.895 DEBUG hyper::proto::h2::client: client response error: protocol error: not a result of an error
Apr 12 15:44:42.896  WARN request{i=1}: hyper_test: http2 error: protocol error: not a result of an error

Rampant speculation

My suspicion is that any stream after the final stream inidicated in the HTTP2 GOAWAY frame, that were spawned before hyper received the goaway, throw this error. I would expect hyper to retry them on a new connection, as it does for any connection spawned later on (I assume after the client has marked the old connection as closed).

seanmonstar commented 3 years ago

Your speculation is correct. The solution for hyper to automatically retry these is a little harder to do.

It would be possible at a layer above hyper (so, by the user, for instance) to use something like tower::retry, to get it the request to retry if you know it will be safe to do so.

Skepfyr commented 3 years ago

I can see why it might be impossible for hyper to retry the request automatically. However, as you say, the errors don't currently give the user enough information to know whether the request can be safely retried. How hard would it be to improve the error signalling to include that information?

I think the piece of information that the user is interested in is whether the requests may have reached the server, or if it definitely didn't. I don't think you can be more precise than that. A general "it's safe to retry that request" would be great but I'm not sure it's possible to determine between transient and more permanent connection issues.

seanmonstar commented 3 years ago

Yea, I think we'd need to redesign the h2::Error type some, so that it can include if it's a stream error, or a connection error (GOAWAY). Then we'd be better equipped to answer that programmatically.

nox commented 3 years ago

@seanmonstar Could you give pointers on how to redesign the h2::Error? We encounter this behaviour in the wild and would like to fix it.