grpc / grpc-node

gRPC for Node.js
https://grpc.io
Apache License 2.0
4.44k stars 640 forks source link

Express App crashing citing "Stream Removed" #708

Open jhizzle84 opened 5 years ago

jhizzle84 commented 5 years ago

Apologies in advance if this is on my end, struck out at StackOverflow and another resource using this package similarly.

I saw similar issues posted here in my search that were upstream resolutions.

Problem description

Express App crashing citing "Stream Removed"

events.js:167
      throw er; // Unhandled 'error' event
      ^

Error: 2 UNKNOWN: Stream removed
    at Object.exports.createStatusError (/home/justin/singles-api/node_modules/grpc/src/common.js:91:15)
    at ClientReadableStream._emitStatusIfDone (/home/justin/singles-api/node_modules/grpc/src/client.js:233:26)
    at ClientReadableStream._receiveStatus (/home/justin/singles-api/node_modules/grpc/src/client.js:211:8)
    at Object.onReceiveStatus (/home/justin/singles-api/node_modules/grpc/src/client_interceptors.js:1272:15)
    at InterceptingListener._callNext (/home/justin/singles-api/node_modules/grpc/src/client_interceptors.js:568:42)
    at InterceptingListener.onReceiveStatus (/home/justin/singles-api/node_modules/grpc/src/client_interceptors.js:618:8)
    at /home/justin/singles-api/node_modules/grpc/src/client_interceptors.js:1029:24
Emitted 'error' event at:
    at ClientReadableStream._emitStatusIfDone (/home/justin/singles-api/node_modules/grpc/src/client.js:234:12)
    at ClientReadableStream._receiveStatus (/home/justin/singles-api/node_modules/grpc/src/client.js:211:8)
    [... lines matching original stack trace ...]
    at /home/justin/singles-api/node_modules/grpc/src/client_interceptors.js:1029:24
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! singles-api@0.0.0 start: `node ./bin/www`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the singles-api@0.0.0 start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

Reproduction steps

grpc installed via npm for use in express app, application works fine in short-timeframe testing but eventually crashes in production tests.

Environment

Additional context

Usage context:

var lnrpcDescriptor = grpc.load("./rpc.proto");
var lnrpc = lnrpcDescriptor.lnrpc;
var lnd = new lnrpc.Lightning(hostport, credentials);
var call = lnd.subscribeInvoices(request)

call.on('data', function(response) {
  if (response.settle_index === '0') {
    console.log("New Invoice Issued: " + response.payment_request)
  }
  else {
    //iterate through array to find who paid their invoice and update the db
    for (var i = 0; i < loadedDB.db.Node.length; i++) {
      if (loadedDB.db.Node[i].add_index == response.add_index) {
        console.log("Node " + loadedDB.db.Node[i].Id + " has settled their invoice.");
        loadedDB.db.Node[i].isSettled = true;
        saveDB.writeEntry();
      }
    }  
  }  
});

call.on('status', function(status) {
  console.log(status);
});

call.on('end', function() {
  console.log('subscribeInvoices stream ended')
});
murgatroid99 commented 5 years ago

Currently, I am not sure what the root cause of that problem might be. One thing that might help is to try using the @grpc/proto-loader and @grpc/grpc-js libraries in place of the grpc library. You will have to change how the .proto file is loaded but otherwise you can use the same code. I don't know if that will solve the problem, but it should at least result in a different error that may help with understanding the problem.

nicolasnoble commented 5 years ago

The other thing that may help is that this error is a nested one. We won't print nested errors unless some traces and/or verbosity is activated. Try running this with the GRPC_TRACE=all and GRPC_VERBOSITY=ERROR environment variables set up. Seeing the nested error might actually help a bit.

idangozlan commented 5 years ago

Unfortunately, it's happening to us every few hours on our production, we are still trying to tackle it, it's not crashing the app since we handling it well but the requests that running on the same time are getting failed (since no API call succeed)

Swhite215 commented 5 years ago

I am also experiencing a similar error when running a Hyperledger Fabric application. I am not sure if they are related.

Error:

(node:7789) UnhandledPromiseRejectionWarning: Error: 2 UNKNOWN: Stream removed at Object.exports.createStatusError (/Users/fcprxbci/Documents/projects/hlfExample/client/ford/node_modules/fabric-client/node_modules/grpc/src/common.js:91:15) at Object.onReceiveStatus (/Users/fcprxbci/Documents/projects/hlfExample/client/ford/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:1204:28) at InterceptingListener._callNext (/Users/fcprxbci/Documents/projects/hlfExample/client/ford/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:568:42) at InterceptingListener.onReceiveStatus (/Users/fcprxbci/Documents/projects/hlfExample/client/ford/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:618:8) at callback (/Users/fcprxbci/Documents/projects/hlfExample/client/ford/node_modules/fabric-client/node_modules/grpc/src/client_interceptors.js:845:24) (node:7789) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1) (node:7789) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

leoDreamer commented 5 years ago

I have the same question, there has any resolution

Xuhao commented 5 years ago

Same here, only on production env :(

leoDreamer commented 5 years ago

@Xuhao Adding a retry would solve this problem when request level error happen

panoti commented 5 years ago

Me too. This error sometimes occurs.

paambaati commented 5 years ago

@murgatroid99 @nicolasnoble I can consistently reproduce this.

Error: 2 UNKNOWN: Stream removed
    at Object.exports.createStatusError (~/node-liftbridge/node_modules/grpc/src/common.js:91:15)
    at ClientReadableStream._emitStatusIfDone (~/node-liftbridge/node_modules/grpc/src/client.js:234:26)
    at ClientReadableStream._receiveStatus (~/node-liftbridge/node_modules/grpc/src/client.js:211:8)
    at Object.onReceiveStatus (~/node-liftbridge/node_modules/grpc/src/client_interceptors.js:1272:15)
    at InterceptingListener._callNext (~/node-liftbridge/node_modules/grpc/src/client_interceptors.js:568:42)
    at InterceptingListener.onReceiveStatus (~/node-liftbridge/node_modules/grpc/src/client_interceptors.js:618:8)
    at ~/node-liftbridge/node_modules/grpc/src/client_interceptors.js:1029:24
Emitted 'error' event on ClientReadableStream instance at:
    at ClientReadableStream._emitStatusIfDone (~/node-liftbridge/node_modules/grpc/src/client.js:235:12)
    at ClientReadableStream._receiveStatus (~/node-liftbridge/node_modules/grpc/src/client.js:211:8)
    [... lines matching original stack trace ...]
    at ~/node-liftbridge/node_modules/grpc/src/client_interceptors.js:1029:24 {
  code: 2,
  metadata: Metadata { _internal_repr: {}, flags: 0 },
  details: 'Stream removed'
}

Whenever I SIGINT my RPC server, I instantly get this error and the process crashes. This is also not caught by any of my event handlers (data, status, error) attached to the ClientReadableStream<MyMessage>.

Is there a clean way of handling this? I'm trying to build fault tolerance to my gRPC calls but I simply can't catch this error to handle it!

idangozlan commented 5 years ago

In our end, we found out that we haven't meshed our services. Try to look on LinkerD or Istio. If it's a matter of routing on http2 it can be resolved that way.

paambaati commented 5 years ago

@idangozlan Thanks, but this is from simple local testing; I'm just trying to connect to an RPC server on localhost, start listening on a stream, and then kill the RPC server.

murgatroid99 commented 5 years ago

@paambaati A line in the middle of your error message says: Emitted 'error' event on ClientReadableStream instance at. If you are listening for an error event on the ClientReadableStream instance and that is still getting thrown, there is some deeper problem here.

Can you share a simple code snippet that reproduces this problem?

Xuhao commented 4 years ago

We have this Stream Removed(and another TCP Read failed reported by #692) error frequently in production, but none in test env.

Our production is run on k8s cluster, and access grpc service via service name(dns resolver is CoreDNS).

Before we set service host to core-grpc:50000, which core-grpc is the domain, 50000 is port, and CoreDNS will resolve it to internal ip like: 172.21.0.121:50000, then this error occurred very frequently.

After we change service host by ENV variables(inject by k8s) to ${process.env.CORE_GRPC_SERVICE_HOST || 'core-grpc'}:50000, then still occurred but it's better than before.

Our test env is quite simple, services run in docker container expose by different port. access each other by localhost:<port>, and no this kind of error.

So is this a network problem?

paambaati commented 4 years ago

@murgatroid99 I have a repro script at https://github.com/paambaati/node-liftbridge/commit/d516570653105c69db553b67f0f03a36ffd44182

Steps to reproduce

  1. docker run -p 4222:4222 -ti nats:latest --debug --trace in a separate window.
  2. GO111MODULE=on go get github.com/liftbridge-io/liftbridge and then $GOPATH/bin/liftbridge --raft-bootstrap-seed --nats-servers nats://localhost:4222 --level debug in another window.
  3. git clone https://github.com/paambaati/node-liftbridge.git
  4. git checkout bug-repro-1
  5. yarn install
  6. yarn build
  7. node lib/bug-repro.js
  8. The script will print a few messages and say "now subscribing...". Now, Ctrl + C the Liftbridge binary you ran in step 2.
  9. The bug-repro.js script will crash after printing the error stacktrace.

Steps 1 and 2 set up the local gRPC server.

murgatroid99 commented 4 years ago

I think you're missing a step because go-liftbridge doesn't seem to install any binary, and its GitHub page says it's a client library, not a server.

Also, there are a couple of ways of simplifying that test code, that might also be useful in general. There is no need to construct a generic Client and then reuse the channel. APIClient is a subclass of Client and also has the waitForReady method; you may not have seen that because the APIClient import looks incorrect. But the waitForReady method is usually not necessary at all. In general if the server is already available the client will just establish a connection when you call a method.

paambaati commented 4 years ago

@murgatroid99 Oops, I’d linked the wrong repo; I’ve now got edited my last comment with the correct repo.

Also, there are a couple of ways of simplifying that test code, that might also be useful in general.

Fair enough, I’ll keep that in mind. Thanks for the tips! The bug-repro script was partially copied from the library I’m building; it tries to connect to multiple servers in the cluster and use the connection that succeeds first (and keep the others in a pool for continually fetching metadata), so that’s why I explicitly connect first and then use that channel.

APIClient is a subclass of Client

Didn’t know this, I’ll use that!

murgatroid99 commented 4 years ago

OK, I got it working, and I got the expected output: a > subscribe error = ... line and a > subscribe on status = ... line.

paambaati commented 4 years ago

@murgatroid99 You got the > subscribe on status = ... on killing the Liftbridge server and not the stacktrace?! That’s strange!

Can you think of what might be wrong in my setup?

murgatroid99 commented 4 years ago

What version of Node are you using? It shouldn't make a difference, but it's the only difference in setup I can think of.

paambaati commented 4 years ago

@murgatroid99 I’m on Node 12.9.1 and macOS Mojave.

murgatroid99 commented 4 years ago

Mac might be the difference. I'm testing this on Linux (specifically Debian).

paambaati commented 4 years ago

@murgatroid99 Are there more logs/traces I can provide? This is a blocker and I'd love to help fix this.

murgatroid99 commented 4 years ago

I'm sorry, but I honestly have no idea what could make errors bypass the error listener like that. And it might not be a grpc-specific problem.

panoti commented 4 years ago

On docker swarm, I changed endpoint_mode to dnsrr instead of vip (by default). The error is gone.

xanatas commented 4 years ago

we are having the same issue only in production

"Error: 2 UNKNOWN: Stream removed
    at Object.exports.createStatusError (/src/node_modules/grpc/src/common.js:91:15)
    at ClientDuplexStream._emitStatusIfDone (/src/node_modules/grpc/src/client.js:233:26)
    at ClientDuplexStream._receiveStatus (/src/node_modules/grpc/src/client.js:211:8)
    at Object.onReceiveStatus (/src/node_modules/grpc/src/client_interceptors.js:1311:15)
    at InterceptingListener._callNext (/src/node_modules/grpc/src/client_interceptors.js:568:42)
    at InterceptingListener.onReceiveStatus (/src/node_modules/grpc/src/client_interceptors.js:618:8)
    at /src/node_modules/grpc/src/client_interceptors.js:1127:18 {

We tried everything.

We are running on kubernetes in gke. And its somehow a network error between kubernetes pods/services i think. Some kube-proxy problem or the way internal cluster persistent connections are handled we honestly dont know. ill try to replicate the cluster and get some internal network adapter logs but this will take a while.

Its pretty random else and we can not stack trace it further thoorugh the app.

Its Node.js apps connecting a bidirectional grpc stream.

juicechu commented 4 years ago

I guess this problem happen beacause env setting https_proxy= Try to remove it env run the command export https_proxy= or unset https_proxy

jwulf commented 4 years ago

I also get this error when the gRPC server throws Error 13, Internal error.

(node:142) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 1)
(node:142) UnhandledPromiseRejectionWarning: Error: 13 INTERNAL: Unexpected error occurred during the request processing
    at Object.exports.createStatusError (/server/node_modules/grpc/src/common.js:91:15)
    at Object.onReceiveStatus (/server/node_modules/grpc/src/client_interceptors.js:1209:28)
    at InterceptingListener._callNext (/server/node_modules/grpc/src/client_interceptors.js:568:42)
    at InterceptingListener.onReceiveStatus (/server/node_modules/grpc/src/client_interceptors.js:618:8)
    at callback (/server/node_modules/grpc/src/client_interceptors.js:847:24)
(node:142) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)

The issue is that there seems to be no way to catch the error in the application.

tvvignesh commented 4 years ago

@jwulf Yup. I am getting the same error too when using zeebe-node

iamdhrooov commented 4 years ago

I am getting a similar error on and off while making calls to the gRPC go server. RPC calls are failing which is causing data inconsistency. I have a setup a node client running on lambda which connects to go gRPC server.

{ Error: 2 UNKNOWN: Stream removed at Object.exports.createStatusError (/var/task/node_modules/grpc/src/common.js:91:15) at Object.onReceiveStatus (/var/task/node_modules/grpc/src/client_interceptors.js:1209:28) at InterceptingListener._callNext (/var/task/node_modules/grpc/src/client_interceptors.js:568:42) at InterceptingListener.onReceiveStatus (/var/task/node_modules/grpc/src/client_interceptors.js:618:8) at callback (/var/task/node_modules/grpc/src/client_interceptors.js:847:24) code: 2, metadata: Metadata { _internal_repr: {}, flags: 0 }, details: 'Stream removed' }

I have kept keepalive setting for node-client as: "grpc.keepalive_time_ms": 10000, "grpc.keepalive_timeout_ms": 5000, "grpc.keepalive_permit_without_calls": 1, "grpc.http2.max_pings_without_data": 0, "grpc.http2.min_time_between_pings_m": 10000, "grpc.http2.min_ping_interval_without_data_ms": 5000,

And go server as: keepalive.ServerParameters{Timeout: time.Duration(5 * time.Second), Time: time.Duration(10 * time.Second)} keepalive.EnforcementPolicy{MinTime: time.Duration(10 * time.Second), PermitWithoutStream: true} opts := []grpc.ServerOption{grpc.Creds(creds), grpc.KeepaliveParams(k), grpc.KeepaliveEnforcementPolicy(kp)} grpcServer := grpc.NewServer(opts...)

Please suggest any workaround for this. Maybe catching this error and calling the rpc again. I have checked every possible article out there. No possible solution yet. Please help.

junojuno90 commented 4 years ago

Sometime we got same error on EKS with grpc-node. Is there no workaround? :<

error: {
    code: 2     
    details: "Stream removed"     
    message: "2 UNKNOWN: Stream removed"     

metadata: {

_internal_repr: {
     }
     flags: 0      
    }
    name: "Error"     
    stack: "Error: 2 UNKNOWN: Stream removed
    at Object.exports.createStatusError (/home/node/node_modules/grpc/src/common.js:91:15)
    at ClientDuplexStream._emitStatusIfDone (/home/node/node_modules/grpc/src/client.js:233:26)
    at ClientDuplexStream._receiveStatus (/home/node/node_modules/grpc/src/client.js:211:8)
    at Object.onReceiveStatus (/home/node/node_modules/grpc/src/client_interceptors.js:1311:15)
    at InterceptingListener._callNext (/home/node/node_modules/grpc/src/client_interceptors.js:568:42)
    at InterceptingListener.onReceiveStatus (/home/node/node_modules/grpc/src/client_interceptors.js:618:8)
    at /home/node/node_modules/grpc/src/client_interceptors.js:1127:18"     
   }
  }
jwulf commented 4 years ago

Yes, you can catch deep errors in the listener, using an interceptor.

See here: https://github.com/creditsenseau/zeebe-client-node-js/blob/master/src/lib/GrpcClient.ts#L584

alisher-agzamov commented 4 years ago

I experienced the same issue with the EKS cluster using AWS NLB. I tried to play with keepalive timeouts in gRPC options on both sides. I solved the issue by creating a secure gRPC server on the app side.

demurray commented 4 years ago

Just to add to the chorus, we also observe Error: 2 UNKNOWN: Stream removed in our NodeJS Microservices in an EKS cluster, specifically when the system is under heavy load.

jwulf commented 3 years ago

The application exit happens when there is no listener on the error event. It's a Node feature where a stream that emits an error causes a fatal exception.

_emitStatusIfDone calls this.emit('error', error);.

Streaming calls extend EventEmitter. On streaming calls, errors will be emitted as events.

jwulf commented 3 years ago

It looks like it can emit error after it has emitted end, if the error is related to the channel, rather than the call.

I was removing the error handler when I got the end event, to prevent memory leaks. In my code, I just leave the error handler attached, and it doesn't throw, and it doesn't leak.

vandersondf commented 3 years ago

These errors may be happening or due to the secure connection grpcs: //, so some users do not face the problem locally by using grpc: //, another point that causes this error is to connect with invalid or different certificates than expected.