read ECONNRESET in @grpc/grpc-js but not in grpc package

Siddhesh-Swami commented 2 years ago

Description: we were using @grpc/grpc-js package in the Kubernetes cluster with the alpine image, recently we got the chance to test in production. Sparingly we are observing the read ECONNRESET on the client-side with no logs on the server-side. We switched to an older version of @grpc/grpc-js--1.2.4 but still the error was observed.
In one of the microservices, we used grpc package with nestjs. that service never gave read ECONNRESET. so migrated all the microservices to grpc@1.24.6 package and now we do not face the read ECONNRESET error. The client takes a pretty good amount of time to connect to the server around like 2secs 3secs but no read ECONNRESET error is observed.

Environment:

OS name, version and architecture: [e.g. Linux Ubuntu 18.04 amd64 Alpine ]
docker image node:14.16.1-alpine
Kubernetes istio load balancing
Node version 14.16.1 -@grpc/proto-loader: 0.5.6 Earlier package: @grpc/grpc-js New package: grpc@1.24.6

please tell any more details to add.

Siddhesh-Swami commented 2 years ago

Any updates please?

vanthome commented 2 years ago

We have a similar issue that seems to appear only when our node.js application is deployed on Kubernetes. Here is our stack:

Node 16
Docker engine on Kubernetes
Calico Networking
grpc-js Version 1.3.6

We are getting this error so frequently that it cannot be due to sporadic connectivity issues.

haimrait commented 2 years ago

Any updates? We are having that symptom as well

hanstf commented 2 years ago

we have similar issue as well:

node 16.14
docker engine on k8s
calico networking
grpc-js 1.5.7
grpc server and client 2 replicas without service mesh

this normally happened after > 10 hours of idle

bangbang93 commented 2 years ago

something related to keepalive? after adding

      keepalive: {
        keepaliveTimeMs: ms('5m'),
      },

I did not get connection reset for several weeks. The default keepalive options might be different between grpc and @grpc/grpc-js

railsonluna commented 2 years ago

Any updates?

khanh-le-otsv commented 2 years ago

@bangbang93 After changing the code, have you faced the issue again?

bangbang93 commented 2 years ago

@bangbang93 After changing the code, have you faced the issue again?

get rid of this for several months.

tomaswitek commented 2 years ago

@bangbang93 After changing the code, have you faced the issue again?

get rid of this for several months.

@bangbang93 we have the same issues, aren't you afraid the performance could suffer after setting this option? 5 minutes sound like a lot.

This is a comment from the source code:

The amount of time to wait for an acknowledgement after sending a ping

Here is a link: https://github.com/grpc/grpc-node/blob/6764dcc79602faee5457243629da520ba08b726f/packages/grpc-js/src/subchannel.ts#L114

Nevertheless I just applied it to our services, let's see how it will play out.

bangbang93 commented 2 years ago

@bangbang93 After changing the code, have you faced the issue again?

get rid of this for several months.

@bangbang93 we have the same issues, aren't you afraid the performance could suffer after setting this option? 5 minutes sound like a lot.

This is a comment from the source code:
The amount of time to wait for an acknowledgement after sending a ping
Here is a link:

https://github.com/grpc/grpc-node/blob/6764dcc79602faee5457243629da520ba08b726f/packages/grpc-js/src/subchannel.ts#L114

Nevertheless I just applied it to our services, let's see how it will play out.

keepaliveTimeMs，not keepaliveTimeoutMs, https://github.com/grpc/grpc-node/blob/6764dcc79602faee5457243629da520ba08b726f/packages/grpc-js/src/subchannel.ts#L109-L112

tomaswitek commented 2 years ago

@bangbang93 sorry I sent a wrong link. I tried both and I still get the message :(, but thx for helping

HofmannZ commented 1 year ago

This works for us:

const channelOptions: ChannelOptions = {
  ...channelOptions,
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

✌️

logidelic commented 1 year ago

Thank you @HofmannZ . Is that fix reliable for you or just makes the problem less evident?

HofmannZ commented 1 year ago

Hey @logidelic,

We ended up with the following config for the client:

// See: https://grpc.github.io/grpc/cpp/md_doc_keepalive.html
const channelOptions: ChannelOptions = {
  ...channelOptions,
  // Send keepalive pings every 6 minutes, default is none.
  // Must be more than GRPC_ARG_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS on the server (5 minutes.)
  'grpc.keepalive_time_ms': 6 * 60 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

And the following config for the server:

// See: https://grpc.github.io/grpc/cpp/md_doc_keepalive.html
const channelOptions: ChannelOptions = {
  ...channelOptions,
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

We've been running it in production for a couple of months, and it works reliably.

grpc / grpc-node

read ECONNRESET in @grpc/grpc-js but not in grpc package #1994