hyperium / tonic

A native gRPC client & server implementation with async/await support.
https://docs.rs/tonic
MIT License
9.84k stars 1k forks source link

TLS connection establishment errors can cause the server to shutdown since tonic v0.12.2 #1989

Open krispraws opened 2 hours ago

krispraws commented 2 hours ago

Bug Report

https://github.com/hyperium/tonic/pull/1882 that was part of tonic v0.12.2 introduced a behavior change where errors during TLS connection establishment terminate the accept loop, and thus shutdown the server.

Version

v0.12.2

Platform

Linux 5.10.210-201.852.amzn2.aarch64

Crates

tonic

Description

We run a Rust GRPC server in production that uses TLS. Since upgrading to tonic v0.12.2, we started seeing the server shutdown randomly. After adding logging, we root caused it to various errors in establishing TLS connections. We have been frequently updating our tonic versions as the TLS accept loop error handling has been refined: https://github.com/hyperium/tonic/pull/1885 https://github.com/hyperium/tonic/pull/1938 https://github.com/hyperium/tonic/pull/1948 https://github.com/hyperium/tonic/pull/1972 Today we had an incident involving a currently unhandled error

2024-10-10T16:06:24.447565645+00:00 - DEBUG tonic::transport::server::incoming - accept loop error error=No route to host (os error 113)

Prior to https://github.com/hyperium/tonic/pull/1882, any errors from tls.accept(stream).await? would be swallowed, as compared to the current behavior of breaking out of the accept loop unless the errors match a specific list. Based on the description of the PR in question, this behavioral change seems inadvertent, and has been addressed by adding errors piecemeal to handle_accept_error, as noted above. We cannot think of a scenario where we'd want to shutdown the server because an individual incoming TLS connection failed to be established. As such, it seems like the correct behavior should be to separate TCP and TLS error handling and return to swallowing TLS errors.

krispraws commented 2 hours ago

@djc , @tottoto - We are working on a PR to address this and will submit it shortly.