Open cocreature opened 1 year ago
~The exception looks expected behaviour. The error needs to be handled on your streamObserver/callListener, a retry sounds fair. A retry policy in grpc may help to do that automatically: see retry example.~
I agree that disconnects are generally UNAVAILABLE. The problem in this case is "you shouldn't have seen that error." It could have been normal, or it could have been an internal error.
https://github.com/grpc/grpc-java/commit/9bdb8f005a392dc358b59e0f3b0e4b34bec222c0 is working, and we have some stacktrace information to go on. But it is rather strange that we learn that the connection is closed when trying to write headers.
You are using plain-text (no TLS)?
Yes this is plaintext.
I think there's enough here to try and make educated guesses on ways that might improve it. But it will be troublesome without a reproduction. It is timing-based, so a reproduction will be probabilistic.
@cocreature, could you share a toxiproxy configuration that kills the connection? I'm not already familiar with it, but it seems great for trying to trigger this. Are you using reset_peer
?
Let me describe our toxiproxy config as best as I can:
We're using toxiproxy-server version 2.5.0
and the toxiproxy java client.
We're creating the proxy within our tests through client.createProxy(name, listenAddress, upstreamAddress)
where client
is created through new ToxiproxyClient()
.
Our client (which is throwing the error) connects to the listen address of the proxy. The actual server is behind the upstream address.
To break the connection we use proxy.disable()
.
That calls https://github.com/shopify/toxiproxy#down.
We don't use reset_peer.
Looks like down
just closes the connection, without setting linger to 0 (as used by reset).
Getting same exception. Unfortunately, we have only trimmed stacktrace in our logs.
io.grpc.StatusRuntimeException: UNKNOWN: channel closed
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:275)
Caused by:
java.nio.channels.ClosedChannelException
at io.grpc.netty.Utils.statusFromThrowable(Utils.java:275)
Caused by:
io.netty.channel.StacklessClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(Object, ChannelPromise)(Unknown Source)
Caused by:
io.netty.channel.unix.Errors$NativeIoException: sendAddress(..) failed: Connection reset by peer
@marx-freedom, your issue seems unrelated to this. File a separate issue if you need to discuss it. In your case gRPC did know why the connection was closed: "Connection reset by peer."
What version of gRPC-Java are you using?
1.44.0
What is your environment?
Ubuntu 22.04
What did you expect to see?
An UNAVAILABLE status code or something similar
What did you see instead?
We saw things fail with this exception and stacktrace:
Interestingly, it does look like the channel recovered from this after the connection established again.
Steps to reproduce the bug
In our test setup, we kill the connection with toxiproxy and then see this failure but only relatively rarely. I don't have a reliable reproduce unfortunately (nor one that I can make public).
Is that expected? Given that it recovers should we just retry on
UNKNOWN: channel closed
like we do on an UNAVAILABLE?