Open vorot93 opened 3 years ago
This issue also affects one of my crates and causes tests to get stuck and fail. Would love to see this get fixed soon. I would like to take a look at this but I'm not sure where to start.
Do either of you know if this only happens with grpc-go
or if happens with other h2 servers?
Do either of you know if this only happens with
grpc-go
or if happens with other h2 servers?
Not sure, as our server is only implemented with Go currently. I will try testing it with the Python examples on https://github.com/grpc/grpc/tree/master/examples/python today.
I have made a repo showcasing the bug happening with the Python examples I linked: https://github.com/yusdacra/tonic-bug
@LucioFranco
Thanks, I do not have time this week to dig into it but will get to this ASAP.
I've done some digging but unsure about the best way to fix this. The hang is initiated from Grpc::streaming
which eventually ends up in Reconnect::call
where it tries to setup the connection to the server. So if the server never responds then it hangs.
I suppose we could change Grpc::streaming
to not block until the connection has been established by connecting in a background task instead and sending the connection back in a oneshot channel. I guess that would mean blocking for that task to complete in Stream::message
instead and propagate any connection errors there.
@LucioFranco Do you think makes sense? If so I would like to take a stab at implementing it 😊
The Java gRPC server implementation also appears to behave this way.
Is there a workaround for this? What does the tonic server send back to the client which makes it work for rust and not for other implementations?
.NET server implementation also triggers this issue.
I can workaround the issue, as I control both ends, by issuing a await responseStream.WriteAsync("ping");
before the usual while (await requestStream.MoveNext()) { ... }
server-side, and discarding the first response right after stream construction client-side.
+1 seeing this issue and unable to use a streaming client as the server is not in my control (seems same issue https://github.com/hyperium/tonic/discussions/1233)
tried comparing python/c++ implementations, my hunch was that the other clients send an 'initial metadata' payload which triggers the server side to always respond, couldn't find similar thing within tonic implementation.
@jordy25519 do you have a reproduction I can look at?
I don't have one on hand. the relevant generated client code is:
hangs here: self.inner.server_streaming(req, path, codec).await
Additionally, I observe the hang resolves after ~2-5minutes I assume due to a close/timeout message from the server which unblocks the client.
like https://github.com/hyperium/tonic/issues/515#issuecomment-778686311 I traced the hang to the Reconnect code but didn't dig much further
had another go at debugging this and in my particular case the server never responds with a http2 Header
frame if it has no message data to stream.
http2 handshake/connection is ok but blocks after sending streaming request and waiting for response in h2
: https://github.com/hyperium/h2/blob/da38b1c49c39ccf7e2e0dca51ebf8505d6905597/src/proto/streams/recv.rs#L318
(this condition is never satisfied until first Header frame sent along with first Data frame, technically they could be sent separately but seems like grpc servers are not doing this by default)
not sure what action tonic can take without knowing/receiving the response headers from the server.
zooming out, as a user I'd expect to get the Streaming
handle/instance back whether there's immediate data available or not and be able to await on that for the messages rather than block on setup i.e. decouple the setup await from the messages await
possibly better to solve on server side but also feel like this should be a 'it just works' thing
Also facing this issue. Not sure how to resolve this. Strangely, it doesn't happen with a Rust server when attempting to create a reproducible example, in my case only with a Python server.
@declark1 as far as I know, you can only solve this by making your server speak first, for example by sending headers server side.
I'm reliably running into this as well with a Go server. I built a simple Go server with the protos from Tonic's streaming example to test. Eyeballing some wireshark dumps it seems jordy25519 is correct and the Go server isn't send an http2/ headers frame until it has data to respond with, even though an equivalent Tonic server does.
It looks like this is only an issue if the client-side code waits for the RPC method to complete to start sending messages on the outgoing stream. Changing the streaming example to the following hung on connect and and never printed connected
.
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut client = EchoClient::connect("http://127.0.0.1:50051").await.unwrap();
let (tx, rx) = tokio::sync::mpsc::channel(10);
println!("connecting");
let response = client
.bidirectional_streaming_echo(tokio_stream::wrappers::ReceiverStream::new(rx))
.await
.unwrap();
println!("connected");
for i in 0..10 {
tx.send(EchoRequest {
message: format!("msg {:02}", i),
})
.await
.unwrap();
}
let mut resp_stream = response.into_inner();
while let Some(received) = resp_stream.next().await {
let received = received.unwrap();
println!("\treceived message: `{}`", received.message);
}
}
Changing the outgoing stream to tokio_stream::pending
also hangs forever and never prints connected
. For example, would expect the following to print connected
and exit.
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut client = EchoClient::connect("http://127.0.0.1:50051").await.unwrap();
println!("connecting");
let response = client
.bidirectional_streaming_echo(tokio_stream::pending())
.await
.unwrap();
println!("connected");
}
I haven't dug deep enough yet to understand if this is an expectation mismatch between Tonic and Hyper, or Hyper and other http/2 implementations. Given the number of folks in this thread reporting issues with Java, Python, etc. servers, it seems like it's at least worth some documentation to point out that you MUST have a ready outgoing message on the request stream before the RPC call returns.
Looking quickly at the Go grpc implementation, it seems like maybe this is a mismatch between Tonic and other GRPC APIs. Generated streaming clients in Go seem to return a BidiStreamingClient which wraps a ClientStream interface and doesn't guarantee Headers are available or that a stream is connected when calling SendMsg. Compare that to Tonic, where generated streaming clients still return a tonic::Response
which guarantees Metadata exists.
Faced the same issue with Rust client & C++ server (bidirectional gRPC streaming). The Rust client used to hang-up and not get a stream back. I had to apply the hack mentioned by @blinsay: on the client-side, add an empty message ready to be sent in the outgoing stream before making gRPC streaming method call.
I consider this as a Tonic bug as the Tonic generated gRPC client doesn't work with other language based servers. It should not be expected for the client to have an outgoing message ready while establishing the stream.
Currently it does expect server to write something back. This causes a hang.
Illustrated example: https://github.com/hyperium/tonic/blob/master/interop/src/client.rs#L155
Move this line after
full_duplex_call
and observe the test fail with timeout.