This PR addresses an issue we found using hyper with HTTP2 CONNECT. Both the client and server are hyper 1.3.1, with tokio and rustls. In this scenario, we are only sending a single request on the connection.
What we found is that on occasion, despite the Connection object being fully .awaited, the SendRequest dropped, the underlying connection is still open; no GoAway is sent from the client. This hangs forever.
The only way we are able to unblock the connection is if we close it from the server side, or if we use tokio's task dump feature -- this indirectly wakes up every task, allowing things to drive to completion.
Below is my analysis of how this is happening:
Then hyper makes a connection, it spawns a H2ClientFuture, and returns an ClientTask. The user awaits the ClientTask (for us, we spawn it).
The ClientTask future holds a conn_eof: ConnEof. This is a oneshot linked up to H2ClientFuture cancel_tx.
An H2ClientFuture future can finish by either con.poll being ready (i.e. it terminated) or this.drop_rx being ready (i.e. it was dropped). When drop_rx is dropped, we drop cancel_tx. This, in turn, should trigger the ClientTask to drop, and trigger a shutdown -- this is the part that never happens.
drop_rx is half of an mpsc. It was added in the initial HTTP2 support (6 years ago) to detect this exact scenario as far as I can tell. The other end of the mpsc lives in conn_drop_ref in the ClientTask. drop_rx gets Ready((None, Receiver { closed: false })) when the final conn_drop_ref is closed. conn_drop_ref can be cloned in a few places, but for us this never happens (seems like this only happens for not-CONNECT).
Instead, we see ClientTask complete, which drops conn_drop_ref, which sends the message on drop_rx, which triggers us to drop(this.cancel_tx).
This is circular: cancel_tx is supposed to notify conn_eof, which ClientTask is watching.. but we got here because ClientTask dropped!
I got a bit lost there, but I think ultimately this means our ConnTask hangs around forever (we returned Pending but have no way to wake it), which holds onto the h2::Connection which blocks shutdown. The dump-tasks fixing it is from calling poll on ConnTask again, and this time hitting the 'conn terminated' section.
The fix I have applied here is to immediately return ready if drop_rx gets a message. This seems to fix the behavior we see, and I cannot think of cases where this wouldn't be correct, though I only researched the CONNECT flow much
As part of the PR I added a simple regression test. This fails 100% of the time on my machine prior to this PR.
This issue is not caught by existing H2 CONNECT tests which all close the connection from the server side; the test here closes from the client (which, I expect, is more common in the real world?)
This PR addresses an issue we found using hyper with HTTP2 CONNECT. Both the client and server are
hyper
1.3.1, with tokio and rustls. In this scenario, we are only sending a single request on the connection.What we found is that on occasion, despite the
Connection
object being fully.await
ed, theSendRequest
dropped, the underlying connection is still open; no GoAway is sent from the client. This hangs forever.The only way we are able to unblock the connection is if we close it from the server side, or if we use tokio's task dump feature -- this indirectly wakes up every task, allowing things to drive to completion.
Below is my analysis of how this is happening:
Then hyper makes a connection, it spawns a H2ClientFuture, and returns an ClientTask. The user awaits the ClientTask (for us, we spawn it).
The ClientTask future holds a
conn_eof: ConnEof
. This is a oneshot linked up to H2ClientFuturecancel_tx
.An H2ClientFuture future can finish by either
con.poll
being ready (i.e. it terminated) orthis.drop_rx
being ready (i.e. it was dropped). Whendrop_rx
is dropped, we dropcancel_tx
. This, in turn, should trigger the ClientTask to drop, and trigger a shutdown -- this is the part that never happens.drop_rx
is half of an mpsc. It was added in the initial HTTP2 support (6 years ago) to detect this exact scenario as far as I can tell. The other end of the mpsc lives inconn_drop_ref
in theClientTask
.drop_rx
getsReady((None, Receiver { closed: false }))
when the finalconn_drop_ref
is closed.conn_drop_ref
can be cloned in a few places, but for us this never happens (seems like this only happens for not-CONNECT).Instead, we see
ClientTask
complete, which dropsconn_drop_ref
, which sends the message ondrop_rx
, which triggers us todrop(this.cancel_tx)
.This is circular:
cancel_tx
is supposed to notifyconn_eof
, whichClientTask
is watching.. but we got here becauseClientTask
dropped!I got a bit lost there, but I think ultimately this means our
ConnTask
hangs around forever (we returned Pending but have no way to wake it), which holds onto theh2::Connection
which blocks shutdown. The dump-tasks fixing it is from callingpoll
on ConnTask again, and this time hitting the 'conn terminated' section.The fix I have applied here is to immediately return ready if drop_rx gets a message. This seems to fix the behavior we see, and I cannot think of cases where this wouldn't be correct, though I only researched the CONNECT flow much
As part of the PR I added a simple regression test. This fails 100% of the time on my machine prior to this PR.
This issue is not caught by existing H2 CONNECT tests which all close the connection from the server side; the test here closes from the client (which, I expect, is more common in the real world?)