grpc / grpc-dart

The Dart language implementation of gRPC.
https://pub.dev/packages/grpc
Apache License 2.0
861 stars 271 forks source link

Exhausted heap space: Out of Memory error with remote-terminated long-running rpc call #458

Closed matejkramny closed 3 years ago

matejkramny commented 3 years ago

Hello,

I've isolated an issue with this library, although I am unable to debug it and find the precise cause. Would appreciate help with this. The issue is with a long running RPC which, if terminated by the remote (f.e. server killed) causes the following error message

E/DartVM  ( 4833): Exhausted heap space, trying to allocate 1752460688 bytes.
[UpdateStream] [FINE] gRPC Error (code: 2, codeName: UNKNOWN, message: Out of Memory, details: null, rawResponse: null)

Library: grpc 2.9.0 on flutter mobile.

Repro steps

Set up a Bidirectional or server stream (long-running).

  1. Connect via the stream
  2. Kill the server
  3. Sometimes get the correct Error: Stream was terminated by peer thrown, otherwise straight to 6.
  4. Restart the connection
  5. Kill the server
  6. Error Out of memory thrown

Expected result: no out of heap space complaints

Actual result: As above in the intro

Details

I've tried looking through the dart debugger into memory when the heap space exhaustion happens but I cannot find anything relevant. Tracing the thrown error leads to client/call.dart function _terminate (but line 0 ??). Dart cannot specify the line number and further frames up are not shown, probably due to the call trace size limit or due to dart being unable to locate the frame because of Futures.

What is the best way to debug this and find what's causing this?

Would love to help via PR, but need to find the issue first

mraleph commented 3 years ago

Debugger is not notified on OOM exceptions because it might OOM itself.

Try running your dart server with --print_stacktrace_at_throw passed to it. It should print a stack trace for OOM.

matejkramny commented 3 years ago

The error is happening on a dart client, sorry should have specified. Server is using a go library.

let me try make a demo you which can reproduce the issue

mraleph commented 3 years ago

You can use the same flag on client as well, sorry I could have been more specific. It is a flag to dart binary itself.

--

// If you get an email from me outside of the 9-5 it is not because I'm always// on or expect an immediate response from you; it is because of work flexibility https://www.inc.com/john-boitnott/how-flexible-hours-can-create-a-better-work-life-balance.html.// Evening and weekend emails are a sign I allocated some regular working hours// for other things (such as family, gym, friends,...). And I encourage you to// feel free to do the same.

matejkramny commented 3 years ago

Ah right sorry, I misunderstood. Will try that and report back. Thank you

matejkramny commented 3 years ago

I've got something, but not sure what I'm looking at.

Screenshot_20210316_095055 Screenshot_20210316_094434

mraleph commented 3 years ago

This looks like we are receiving some garbage from the server which it then tries to misinterpret as a gRPC message. [60, 104, 116, 109, 108] is <html in ASCII.

You could try to patch the package, by adding the following code before Uint8List(...) allocation:

if (dataLength == 1752460652) {
  print(utf8.decode(chunkData));
}

This would print the content that server send back. I guess when server is shutting down it takes grpc handler down before taking down HTTP server and then you receive a normal 404 back or something like that.

I can actually reproduce this behavior if I make a normal HTTP/2 server (not gRPC capable) and then try to use gRPC client to talk to it.

There is no checking in the _GrpcMessageConversionSink which would validate that the stream we are getting is actually following gRPC protocol - it just assumes that it is.

I guess we could put something like this in:

diff --git a/lib/src/shared/streams.dart b/lib/src/shared/streams.dart
index cfd5869..aa6aff1 100644
--- a/lib/src/shared/streams.dart
+++ b/lib/src/shared/streams.dart
@@ -117,7 +117,10 @@ class _GrpcMessageConversionSink extends ChunkedConversionSink<StreamMessage> {
       // TODO(jakobr): Handle duplicate header names correctly.
       headers[ascii.decode(header.name)] = ascii.decode(header.value);
     }
-    // TODO(jakobr): Check :status, go to error mode if not 2xx.
+    final contentType = headers['content-type'];
+    if (contentType == null || !contentType.startsWith('application/grpc')) {
+      throw GrpcError.unimplemented('Received non-gRPC message');
+    }
     _out.add(GrpcMetadata(headers));
   }
matejkramny commented 3 years ago

Thanks, that's vert helpful! Turns out the remote nginx proxy was returning HTTP 503 and disregarding any grpc protocols, causing this issue.

1752460652 bytes is ~1.7gb. I'm not sure what a sane default would be though