Open krispraws opened 1 month ago
@djc , @tottoto - We are working on a PR to address this and will submit it shortly.
Is this issue solved? If so, it would be lovely to get this fix in a patch release.
Is this issue solved? If so, it would be lovely to get this fix in a patch release.
@eric-seppanen , yes, the fix was merged. I'd also like to get a release.
Bug Report
https://github.com/hyperium/tonic/pull/1882 that was part of tonic v0.12.2 introduced a behavior change where errors during TLS connection establishment terminate the accept loop, and thus shutdown the server.
Version
v0.12.2
Platform
Linux 5.10.210-201.852.amzn2.aarch64
Crates
tonic
Description
We run a Rust GRPC server in production that uses TLS. Since upgrading to tonic v0.12.2, we started seeing the server shutdown randomly. After adding logging, we root caused it to various errors in establishing TLS connections. We have been frequently updating our tonic versions as the TLS accept loop error handling has been refined: https://github.com/hyperium/tonic/pull/1885 https://github.com/hyperium/tonic/pull/1938 https://github.com/hyperium/tonic/pull/1948 https://github.com/hyperium/tonic/pull/1972 Today we had an incident involving a currently unhandled error
Prior to https://github.com/hyperium/tonic/pull/1882, any errors from
tls.accept(stream).await?
would be swallowed, as compared to the current behavior of breaking out of the accept loop unless the errors match a specific list. Based on the description of the PR in question, this behavioral change seems inadvertent, and has been addressed by adding errors piecemeal tohandle_accept_error
, as noted above. We cannot think of a scenario where we'd want to shutdown the server because an individual incoming TLS connection failed to be established. As such, it seems like the correct behavior should be to separate TCP and TLS error handling and return to swallowing TLS errors.