hyperium / tonic

A native gRPC client & server implementation with async/await support.
https://docs.rs/tonic
MIT License
10.02k stars 1.02k forks source link

Client must not expect to hear back from server when establishing bidirectional stream #515

Open vorot93 opened 3 years ago

vorot93 commented 3 years ago

Currently it does expect server to write something back. This causes a hang.

Illustrated example: https://github.com/hyperium/tonic/blob/master/interop/src/client.rs#L155

Move this line after full_duplex_call and observe the test fail with timeout.

yusdacra commented 3 years ago

This issue also affects one of my crates and causes tests to get stuck and fail. Would love to see this get fixed soon. I would like to take a look at this but I'm not sure where to start.

LucioFranco commented 3 years ago

Do either of you know if this only happens with grpc-go or if happens with other h2 servers?

yusdacra commented 3 years ago

Do either of you know if this only happens with grpc-go or if happens with other h2 servers?

Not sure, as our server is only implemented with Go currently. I will try testing it with the Python examples on https://github.com/grpc/grpc/tree/master/examples/python today.

yusdacra commented 3 years ago

I have made a repo showcasing the bug happening with the Python examples I linked: https://github.com/yusdacra/tonic-bug

@LucioFranco

LucioFranco commented 3 years ago

Thanks, I do not have time this week to dig into it but will get to this ASAP.

davidpdrsn commented 3 years ago

I've done some digging but unsure about the best way to fix this. The hang is initiated from Grpc::streaming which eventually ends up in Reconnect::call where it tries to setup the connection to the server. So if the server never responds then it hangs.

I suppose we could change Grpc::streaming to not block until the connection has been established by connecting in a background task instead and sending the connection back in a oneshot channel. I guess that would mean blocking for that task to complete in Stream::message instead and propagate any connection errors there.

@LucioFranco Do you think makes sense? If so I would like to take a stab at implementing it 😊

sfackler commented 2 years ago

The Java gRPC server implementation also appears to behave this way.

behos commented 1 year ago

Is there a workaround for this? What does the tonic server send back to the client which makes it work for rust and not for other implementations?

r-ml commented 1 year ago

.NET server implementation also triggers this issue.

I can workaround the issue, as I control both ends, by issuing a await responseStream.WriteAsync("ping"); before the usual while (await requestStream.MoveNext()) { ... } server-side, and discarding the first response right after stream construction client-side.

jordy25519 commented 1 year ago

+1 seeing this issue and unable to use a streaming client as the server is not in my control (seems same issue https://github.com/hyperium/tonic/discussions/1233)

tried comparing python/c++ implementations, my hunch was that the other clients send an 'initial metadata' payload which triggers the server side to always respond, couldn't find similar thing within tonic implementation.

e.g. https://github.com/grpc/grpc/blob/fc159a690158ed089b19d3eb9f76e8399e3207ca/src/python/grpcio_tests/tests/unit/_cython/cygrpc_test.py#L361-L362

LucioFranco commented 1 year ago

@jordy25519 do you have a reproduction I can look at?

jordy25519 commented 1 year ago

I don't have one on hand. the relevant generated client code is: hangs here: self.inner.server_streaming(req, path, codec).await

```rust pub async fn stream_foo( &mut self, request: impl tonic::IntoRequest, ) -> std::result::Result< tonic::Response>, tonic::Status, > { self.inner .ready() .await .map_err(|e| { tonic::Status::new( tonic::Code::Unknown, format!("Service was not ready: {}", e.into()), ) })?; let codec = tonic::codec::ProstCodec::default(); let path = http::uri::PathAndQuery::from_static( "/crate_foo_rpc.CrateFooRpc/StreamFoo", ); let mut req = request.into_request(); req.extensions_mut() .insert( GrpcMethod::new( "crate_foo_rpc.CrateFooRpc", "StreamFoo", ), ); /// ---HANGS HERE--- self.inner.server_streaming(req, path, codec).await } ```

Additionally, I observe the hang resolves after ~2-5minutes I assume due to a close/timeout message from the server which unblocks the client.

like https://github.com/hyperium/tonic/issues/515#issuecomment-778686311 I traced the hang to the Reconnect code but didn't dig much further

jordy25519 commented 1 year ago

had another go at debugging this and in my particular case the server never responds with a http2 Header frame if it has no message data to stream.

http2 handshake/connection is ok but blocks after sending streaming request and waiting for response in h2: https://github.com/hyperium/h2/blob/da38b1c49c39ccf7e2e0dca51ebf8505d6905597/src/proto/streams/recv.rs#L318 (this condition is never satisfied until first Header frame sent along with first Data frame, technically they could be sent separately but seems like grpc servers are not doing this by default) not sure what action tonic can take without knowing/receiving the response headers from the server.

zooming out, as a user I'd expect to get the Streaming handle/instance back whether there's immediate data available or not and be able to await on that for the messages rather than block on setup i.e. decouple the setup await from the messages await

possibly better to solve on server side but also feel like this should be a 'it just works' thing

declark1 commented 4 months ago

Also facing this issue. Not sure how to resolve this. Strangely, it doesn't happen with a Rust server when attempting to create a reproducible example, in my case only with a Python server.

Adphi commented 4 months ago

@declark1 as far as I know, you can only solve this by making your server speak first, for example by sending headers server side.

blinsay commented 3 months ago

I'm reliably running into this as well with a Go server. I built a simple Go server with the protos from Tonic's streaming example to test. Eyeballing some wireshark dumps it seems jordy25519 is correct and the Go server isn't send an http2/ headers frame until it has data to respond with, even though an equivalent Tonic server does.

It looks like this is only an issue if the client-side code waits for the RPC method to complete to start sending messages on the outgoing stream. Changing the streaming example to the following hung on connect and and never printed connected.

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = EchoClient::connect("http://127.0.0.1:50051").await.unwrap();

    let (tx, rx) = tokio::sync::mpsc::channel(10);

    println!("connecting");
    let response = client
        .bidirectional_streaming_echo(tokio_stream::wrappers::ReceiverStream::new(rx))
        .await
        .unwrap();
    println!("connected");

    for i in 0..10 {
        tx.send(EchoRequest {
            message: format!("msg {:02}", i),
        })
        .await
        .unwrap();
    }

    let mut resp_stream = response.into_inner();

    while let Some(received) = resp_stream.next().await {
        let received = received.unwrap();
        println!("\treceived message: `{}`", received.message);
    }
}

Changing the outgoing stream to tokio_stream::pending also hangs forever and never prints connected. For example, would expect the following to print connected and exit.

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = EchoClient::connect("http://127.0.0.1:50051").await.unwrap();

    println!("connecting");
    let response = client
        .bidirectional_streaming_echo(tokio_stream::pending())
        .await
        .unwrap();
    println!("connected");
}

I haven't dug deep enough yet to understand if this is an expectation mismatch between Tonic and Hyper, or Hyper and other http/2 implementations. Given the number of folks in this thread reporting issues with Java, Python, etc. servers, it seems like it's at least worth some documentation to point out that you MUST have a ready outgoing message on the request stream before the RPC call returns.

blinsay commented 3 months ago

Looking quickly at the Go grpc implementation, it seems like maybe this is a mismatch between Tonic and other GRPC APIs. Generated streaming clients in Go seem to return a BidiStreamingClient which wraps a ClientStream interface and doesn't guarantee Headers are available or that a stream is connected when calling SendMsg. Compare that to Tonic, where generated streaming clients still return a tonic::Response which guarantees Metadata exists.

berwani commented 2 months ago

Faced the same issue with Rust client & C++ server (bidirectional gRPC streaming). The Rust client used to hang-up and not get a stream back. I had to apply the hack mentioned by @blinsay: on the client-side, add an empty message ready to be sent in the outgoing stream before making gRPC streaming method call.

I consider this as a Tonic bug as the Tonic generated gRPC client doesn't work with other language based servers. It should not be expected for the client to have an outgoing message ready while establishing the stream.