Open chinedufn opened 1 day ago
Have you been able to trace what is happening? One thing that sort of sounds like is that the TLS stream perhaps hasn't flushed the response before closing?
But do you know for sure all the requests have indeed started? Or could the shutdown be triggered just before hyper has been able to see the request bytes?
Or could the shutdown be triggered just before hyper has been able to see the request bytes?
Seems to be the problem. Most of this comment is how I arrived at that. Skip to the end for some potential solutions.
Did a bit of hyper=trace,hyper_util=trace
tracing before opening the issue but ~nothing jumped out at me.~ (oh, nevermind. Now I remember. I wasn't getting hyper
traces because I didn't enable the feature. Next time I dive in I can enable that.)
Hmm, so if I sleep for 1ms
before dropping the watch::Receiver
that serves as the shut down signal then the test passes.
As in, a sleep before this line: https://github.com/chinedufn/hyper-tls-graceful-shutdown-issue/blob/7c3d52f54e839096022a3e9b7b478ad0a635293a/src/lib.rs#L42
Inlining the snippet code here for convenience:
loop {
tokio::select! {
_ = &mut shutdown_receiver => {
// ADDING THIS SLEEP MAKES THE TEST PASS
tokio::time::sleep(std::time::Duration::from_millis(1)).await;
drop(shut_down_connections_rx);
break;
}
conn = tcp_listener.accept() => {
tokio::spawn(
handle_tcp_conn(
conn,
wait_for_request_to_complete_rx.clone(),
shut_down_connections_tx.clone(),
tls_config
)
);
}
}
}
The drop(shut_down_connections_rx)
shut down signal causes this Connection::gracful_shutdown
tokio::select!
branch to be selected.
https://github.com/chinedufn/hyper-tls-graceful-shutdown-issue/blob/7c3d52f54e839096022a3e9b7b478ad0a635293a/src/lib.rs#L145-L159
Inlining the snippet here for convenience:
tokio::select! {
result = conn.as_mut() => {
if let Err(err) = result {
dbg!(err);
}
}
_ = should_shut_down_connection => {
// TEST STILL FAILS IF WE SLEEP RIGHT HERE
conn.as_mut().graceful_shutdown();
let result = conn.as_mut().await;
if let Err(err) = result {
dbg!(err);
}
}
};
The test passes if we sleep for a millisecond before sending on the channel that leads to conn.as_mut().graceful_shutdown();
getting called for all open connections.
i.e. if we sleep for one millisecond right before this line: https://github.com/chinedufn/hyper-tls-graceful-shutdown-issue/blob/7c3d52f54e839096022a3e9b7b478ad0a635293a/src/lib.rs#L42
If I instead move the sleep to just before the conn.as_mut().graceful_shutdown();
line, the test fails, even if we sleep for 1, 5 seconds, 50milliseconds or seemingly any other amount of time.
( i.e. if we sleep for five seconds right before this line the test will fail -> https://github.com/chinedufn/hyper-tls-graceful-shutdown-issue/blob/7c3d52f54e839096022a3e9b7b478ad0a635293a/src/lib.rs#L152 )
( I also confirmed that sleeping for 50 milliseconds leads the test to fail.)
This suggests that the problem occurs when we call Connection::graceful_shutdown
before we've started polling the connection and receiving bytes.
It looks like graceful_shutdown
calls disable_keepalive
, and disable_keepalive
closes the connection if no bytes have been received.
https://github.com/hyperium/hyper/blob/master/src/proto/h1/dispatch.rs#L90-L100
Or could the shutdown be triggered just before hyper has been able to see the request bytes?
Yeah seems like this is the problem.
Currently, if a user opens a TCP connection to a server and the server calls Connection::graceful_shutdown
before any bytes have been received, the TCP connection will be closed.
This means that if the client has just begun transmitting packets, but the server has not received them, the client will get an error. This is not a graceful shutdown since the client was not made aware that the connection was going to be closed.
disable_keepalive
decides whether to close the connectionI haven't yet poked around to figure out what has_initial_read_write_state
is checking for.
Not yet sure why the tests would pass for a non-TLS server but fail for a TLS server.
Is it possible that disable_keepalive
is immediately closing the connection even if the TLS negotiation process has begun?
Could a solution be to avoid closing the connection if the TLS negotiation process has begun?
Duration
before closing an unused connectionOne way to avoid such an error would be to something like:
Duration
"W" (configurable) to see if the client sends any packetsDuration
"W", close the connectionDuration
"W", receive the bytes and then close the connectionI'm unfamiliar with the TCP spec, but from some quick searching it the FIN
segment might be useful here?
I can do more research if the above steps seem like a good path forward.
...
I'm observing errors while testing graceful shutdown of a hyper server.
When gracefully shutting down a TLS connection the client will sometimes get an
IncompleteMessage
error.This only happens for TLS connections. The graceful shutdown process is always successful for non-TLS connections.
Given the following testing steps:
When the hyper server is not using TLS, the test pass. When the hyper server is using TLS, the test fails with an
IncompleteMessage
error.I've created a repository that reproduces the issue https://github.com/chinedufn/hyper-tls-graceful-shutdown-issue
Here's a quick snippet of the graceful shutdown code:
Here is the full source code for convenience (also available in the linked repository)
Cargo.toml (click to expand)
```toml [package] name = "hyper-graceful-shutdown-issue" version = "0.1.0" edition = "2021" publish = false # We specify exact versions of the dependencies to ensure that the issue is reproducible. [dependencies] hyper = { version = "=1.5.1", features = ["client", "http1"] } hyper-util = { version = "=0.1.10", features = ["http1", "tokio", "server"] } http-body-util = "=0.1.2" futures-util = "=0.3.31" rand = "=0.8.5" reqwest = { version = "=0.12.9" } rustls-pemfile = "2" tokio = { version = "=1.41.1", features = ["macros", "net", "rt-multi-thread", "sync", "time"] } tokio-rustls = "0.26" ```Rust code (click to expand)
```rust use futures_util::pin_mut; use http_body_util::Empty; use hyper::body::Bytes; use hyper::body::Incoming; use hyper::{Request, Response, StatusCode}; use hyper_util::rt::{TokioExecutor, TokioIo}; use rand::Rng; use rustls_pemfile::{certs, pkcs8_private_keys}; use std::io::{BufReader, Cursor}; use std::net::SocketAddr; use std::sync::Arc; use std::time::Duration; use tokio::io::{AsyncRead, AsyncWrite}; use tokio::net::{TcpListener, TcpStream}; use tokio::sync::watch::Sender; use tokio::sync::{oneshot, watch}; use tokio_rustls::rustls::pki_types::PrivateKeyDer; use tokio_rustls::rustls::server::Acceptor; use tokio_rustls::rustls::ServerConfig; use tokio_rustls::LazyConfigAcceptor; #[derive(Copy, Clone)] enum TlsConfig { Disabled, Enabled, } async fn run_server( tcp_listener: TcpListener, mut shutdown_receiver: oneshot::Receiver<()>, tls_config: TlsConfig, ) { let enable_graceful_shutdown = true; let (wait_for_requests_to_complete_tx, wait_for_request_to_complete_rx) = watch::channel::<()>(()); let (shut_down_connections_tx, shut_down_connections_rx) = watch::channel::<()>(()); loop { tokio::select! { _ = &mut shutdown_receiver => { drop(shut_down_connections_rx); break; } conn = tcp_listener.accept() => { tokio::spawn( handle_tcp_conn( conn, wait_for_request_to_complete_rx.clone(), shut_down_connections_tx.clone(), tls_config ) ); } } } drop(wait_for_request_to_complete_rx); if enable_graceful_shutdown { wait_for_requests_to_complete_tx.closed().await; } } async fn handle_tcp_conn( conn: tokio::io::Result<(TcpStream, SocketAddr)>, indicate_connection_has_closed: watch::Receiver<()>, should_shut_down_connection: watch::Sender<()>, tls_config: TlsConfig, ) { let tcp_stream = conn.unwrap().0; let builder = hyper_util::server::conn::auto::Builder::new(TokioExecutor::new()); match tls_config { TlsConfig::Disabled => { let stream = TokioIo::new(tcp_stream); handle_tokio_io_conn(builder, stream, should_shut_down_connection).await } TlsConfig::Enabled => { let acceptor = LazyConfigAcceptor::new(Acceptor::default(), tcp_stream); tokio::pin!(acceptor); let start = acceptor.as_mut().await.unwrap(); let config = rustls_server_config(); let stream = start.into_stream(config).await.unwrap(); let stream = TokioIo::new(stream); handle_tokio_io_conn(builder, stream, should_shut_down_connection).await } }; drop(indicate_connection_has_closed); } fn rustls_server_config() -> Arc