Closed cristiangraz closed 6 years ago
So the default keepalive enforcement policy on the server is 5 minutes: https://github.com/grpc/grpc-go/blob/cce0e436e5c428f2094edd926779788f1024938d/keepalive/keepalive.go#L62
Also, by default client doesn't run keepalive unless there are active streams: https://github.com/grpc/grpc-go/blob/cce0e436e5c428f2094edd926779788f1024938d/keepalive/keepalive.go#L38
This explains why the connection is closed by the server soon after the client makes an RPC.
If you set the client to send keepalive pings every 120 seconds, make sure you set the server's enforcement policy accordingly.
Also turning on transport level logs by setting verbosity level to 2 will help seeing these messages: https://github.com/grpc/grpc-go/blob/cce0e436e5c428f2094edd926779788f1024938d/grpclog/grpclog.go#L26
Hope this helps.
@MakMukhi Thank you! Removing the client keepalive params seems to have resolved the issue. I'll make sure those match up with the server if I change those in the future. This was even clearly mentioned in the docs 🤦♂️
What version of gRPC are you using?
Client::
1.8.0
(5a9f7b402fe85096d2e1d0383435ee1876e863d0
) Server:1.8.0
(a4bf341022f076582fc2e9a802ce170a8938f81d
)What version of Go are you using (
go version
)?Client:
1.9.2
Server:1.9.1
What operating system (Linux, Windows, …) and version?
Client: Alpine linux 3.6 Server: Alpine linux 3.6
What did you do?
I'm seeing intermittent but very frequent
transport is closing
errors. Example setup below:grpc-go
WithKeepAliveParams
on the client with 120s ping*grpc.ClientConn
when the client closes. We are fixing this now, but also we don't start/stop services very often. Clients and servers pods often run for several days or even weeks at a time without restart perkubectl
I added in a 5 second polling of the grpc connection state (
conn.GetState()
) and noticed the connection stayed atREADY
for hours, but as soon as I triggered some API calls it would go toTRANSIENT_FAILURE
->CONNECTING
->READY
. As I kept retrying API calls sometimes they would be successful for 50+ API requests and sometimes every other request or every few requests would repeat the error (loggingtransport is closing
).I set the ENV var
GRPC_GO_LOG_SEVERITY_LEVEL=info
on the client which logs this:READY
toTRANSIENT_FAILURE
here says:But I can't determine what the failure causing
transport is closing
is.I'm not sure if it's a configuration issue (maybe the server needs to align with the client keepalive params) or something else. Nothing jumped out at me in the recent commits that may address this, so curious to see if anyone has any idea where to go next to identify the cause.
Client connection:
Server looks like this:
We are running on GKE connecting using Kubernetes (
1.8.1-gke.1
for master and nodes) service discovery to connect toservice.service.svc.cluster.local:80
.What did you expect to see?
Successful grpc calls.
What did you see instead?
transport is closing
errors. See above.