Closed zejunlitg closed 2 months ago
Thanks @zejunlitg for the question. I will take a look and get back to you
It's my belief that this is caused by the fact that the underlying TCP connection is closed on the client side, but client side tried still to write on it.
Could you clarify more what do you mean by above? "client preface" is the string that must be sent by new connections from clients. This error indicates a failure when trying to write the initial client message (client preface) to establish the gRPC connection. The specific error "use of closed network connection" suggests that the TCP connection was closed unexpectedly.
@purnesh42H
AFAIK, this error happens within golang's net
package:
conn, err := net.Dial("tcp", ":8888")
if err != nil {
log.Println("dial error:", err)
return
// close the connection here
conn.Close()
// then try to write over the connection, will throw the error
// 'write tcp x.x.x.x:PORT_SRC->x.x.x.x:PORT_SRC: use of closed network connection'
n, err = conn.Write(buf)
That's why I said the connection is closed on the client side, I hope this clarifies.
I agree that this happen unexpectedly, it's exactly what happened, can you help me understand more about what I can do when it happens? Do I:
@zejunlitg please refer to retry documentation for more details, if not already done.
Meanwhile, could you provide more details on following?
@purnesh42H I've read the retry documentation and it does not answer my question. That's why I'm posting here for a dev answer. Unless I missed it in the doc, to be very explicit, the question is: does gRPC retry handle the fact that the network connection gets unexpectedly closed? This involves implementation details that the doc does not reveal.
RE 1: I copied the retry policy in golang example:
var retryPolicy = `{
"methodConfig": [{
// config per method or all methods under service
"name": [{"service": "grpc.examples.echo.Echo"}],
"waitForReady": true,
"retryPolicy": {
"MaxAttempts": 4,
"InitialBackoff": ".01s",
"MaxBackoff": ".01s",
"BackoffMultiplier": 1.0,
// this value is grpc code
"RetryableStatusCodes": [ "UNAVAILABLE" ]
}
}]
}`
And then in the example it's using this API grpc.NewClient()
:
conn, err := grpc.NewClient(ctx,grpc.WithTransportCredentials(insecure.NewCredentials()), grpc.WithDefaultServiceConfig(retryPolicy))
The only difference in my use is I'm using this:
grpc.Dial(endPoint, DialOptions()...)
and then here's the options we're using, I'm plugging in grpc.WithDefaultServiceConfig(retryPolicy)
here:
func DialOptions() []grpc.DialOption {
bc := backoff.DefaultConfig
bc.MaxDelay = 5 * time.Second
return []grpc.DialOption{
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithConnectParams(grpc.ConnectParams{
Backoff: bc,
}),
grpc.WithDefaultCallOptions(CallOptions()...),
}
}
RE 2, no idea about the reason, from the server log, the RPC call is not received -- we have set up interceptor that prints RPC receiving & finishing log, normally when the server receives the RPC call it would be logged. When this issue happened, no relevant log was found on the server side. As I mentioned before, this is a rare issue that's difficult to reproduce. Regardless, we still want to know the course of action for best practice. We can manually call the same RPC after some sleep or we can use gRPC built-in retry mechanism, I'm still not sure if the former or latter would work, if you can provide some insights I appreciate it.
Thanks for the details. I will get back to you on transport retries. Meanwhile, to answer your other question, one way to repro client preface write network failure is to provide your custom dialer implementing net.Conn
and override write()
. See WithContextDialer
@zejunlitg in the example retry client retry policy have UNAVAILABLE
as RetryableStatusCodes which is the error code for client preface write failure, so client will retry. As mentioned above, you can verify this by providing your own custom dialer implementing *net.Conn
.
So, to answer your question, retry policies are the recommended way for dealing with transient failures. However, the recommended approach is to fetch the retry configuration (which is part of the service config) from the name resolver rather than defining it on the client side.
Feel free to reopen the issue if you have anymore questions
error:
It's my belief that this is caused by the fact that the underlying TCP connection is closed on the client side, but client side tried still to write on it. Apparently this is a random issue -- with the env I have, I cannot reproduce this at all.
My questions are:
grpc.NewClient()
while I'm usinggrpc.Dial
, after which I'm getting the client with theconn
it returns.gdb
'scall close(fd)
(ref: https://incoherency.co.uk/blog/stories/closing-a-socket.html). When TCP connection closes, the gRPC call gets stuck for some reason. I was expecting it to sense that the network connection is closed and will thus throw the error, but it does not do that.gRPC version: 1.57.2 Thank you very much for help.