hyperium / h2

HTTP 2.0 client & server implementation for Rust.
MIT License
1.34k stars 265 forks source link

Fix race condition in connection termination #772

Closed howardjohn closed 2 months ago

howardjohn commented 2 months ago

See https://github.com/hyperium/hyper/issues/3652.

What I have found is the final reference to a stream being dropped after the maybe_close_connection_if_no_streams but before the inner.poll() completes can lead to the connection dangling forever without any forward progress. No streams/references are alive, but the connection is not complete and never wakes up again. This seems like a classic TOCTOU race condition.

In this fix, I check again at the end of poll and if this state is detected, wake up the task again.

Wth the test in https://github.com/hyperium/hyper/pull/3655, on my machine, it fails about 5% of the time:

1876 runs so far, 100 failures (94.94% pass rate). 95.197349ms avg, 1.097347435s max, 5.398457ms min

With that PR, this test is 100% reliable

64010 runs so far, 0 failures (100.00% pass rate). 44.484057ms avg, 121.454709ms max, 1.872657ms min

Note: we also have reproduced this using h2 directly outside of hyper, which is what gives me confidence this issue lies in h2 and not hyper.