grpc / grpc-dart

The Dart language implementation of gRPC.
https://pub.dev/packages/grpc
Apache License 2.0
861 stars 271 forks source link

gRPC stream cancellation freezes future requests on same channel when response includes an array of bytes with 0x1000 bytes or more #624

Closed ruicraveiro closed 1 year ago

ruicraveiro commented 1 year ago

grpc package version: 3.1.0

Repro steps

I built a prototype to demonstrate this bug that you can find in https://github.com/ruicraveiro/grpc_array_bug_demo.

  1. step1 Assume the following proto:
    
    syntax = "proto3";

package grpc;

service FileService { rpc GetFiles(Empty) returns (stream File) {} }

message File { bytes contents = 1; }

message Empty { }


The following Dart file creates a random array of bytes of size fileSize (0x1000). It also implements the service defined in the proto and that implementation will return 30 instances of the byte array (only the size of the file matters, not its contents):

```dart
import 'package:grpc_array_bug_demo/src/generated/file_service.pbgrpc.dart';
import 'package:grpc/grpc.dart' as grpc;
import 'dart:math' as m;

const fileSize = 0x1000;

void main(List<String> args) async {
  final server = grpc.Server([FileService()]);
  await server.serve(port: 8080);
  print('Server listening on port ${server.port}...');
}

final r = m.Random();
final file = List.generate(fileSize, (i) {
  return r.nextInt(0xff);
});

class FileService extends FileServiceBase {
  int callCount = 0;

  @override
  Stream<File> getFiles(grpc.ServiceCall call, Empty request) async* {
    for (var index = 0; index < 30; index++) {
      try {
        final imgRes = File(contents: file);
        yield imgRes;
      } catch (ex) {
        print('Could not send with $ex');
      }
    }
  }
}

Finally, the following client asks the server for the stream of byte arrays, but upon receiving the 6th file, cancels the stream. It will do the same thing 10 times:

import 'package:grpc/grpc.dart';
import 'package:grpc_array_bug_demo/src/generated/file_service.pbgrpc.dart';

final channel = ClientChannel('127.0.0.1',
    port: 8080,
    options: const ChannelOptions(credentials: ChannelCredentials.insecure()));
final stub = FileServiceClient(channel);

void main(List<String> args) async {
  for (var i = 0; i < 10; i++) {
    print('Will get files ($i iteration)');
    await runGetFiles();
  }
  await channel.shutdown();
  print('\n\nDONE');
}

Future<void> runGetFiles() async {
  final call = stub.getFiles(Empty());
  var count = 0;
  try {
    await for (var number in call) {
      count++;
      print(
          'Received file with ${number.contents.length} bytes (count=$count)');
      if (count > 5) {
        await call.cancel();
      }
    }
  } on GrpcError catch (e) {
    print('Caught: $e');
  }
  print('Final count: $count');
}
  1. step2

Run the server (bin/server.dart).

  1. step3

Run the client (bin/client.dart).

Expected result:

10 iterations in which the client cancels the stream after the 6th byte array has been received and then the client should exit successfully.

Actual result: On the second iteration, the client is freezing on the await for on the stream.

Details

The freeze happens when the size of the returned array is 0x1000 or more. On my machine, if I reduce the size of the array down to 0x999, the client behaves normally. While the prototype I created uses Dart for client and server, I suspect the bug is on the client side. The reason for why I think so is because this behaviour originally appeared while I was testing an application I am developing with Flutter on the client and .Net on the server. Initially I assumed I was doing something wrong on the server, so I tried using Go, but found that the problem remained, which started making me suspect the problem was on the client instead, and finally I based the final prototype on the official example found in https://github.com/grpc/grpc-dart/tree/master/example/metadata. The only effective difference between my prototype and that example, is that while the latter is returning Fibonacci numbers, the former is returning byte arrays. Still, because the bug is happening with 3 different server implementations, I think it is a safe bet to assume the bug is in the client implementation of grpc-dart. Moreover, I also added a Go client implementation (~/goclient on the repository) and with that client, the entire work-flow is working entirely as expected, without any freeze and on top of the Dart server.

While I ran this prototype on Linux, the bug originally appeared on a Flutter application I have been testing on Android. I haven't been able to reproduce the bug on Flutter Web, so I suspect there should be no problem with gRPC-web.

Finally, the bug happens regardless of the whether I am reusing the client stub or not. If, however I create a new channel for each iteration, then the client no longer freezes. So, this version of the client will work (which in theory, though, is less optimal because I will no longer be reusing a single connection for each iteration).:

import 'package:grpc/grpc.dart';
import 'package:grpc_array_bug_demo/src/generated/file_service.pbgrpc.dart';

void main(List<String> args) async {
  for (var i = 0; i < 10; i++) {
    print('Will get files ($i iteration)');
    await runGetFiles();
  }
  print('\n\nDONE');
}

Future<void> runGetFiles() async {
  final channel = ClientChannel('127.0.0.1',
      port: 8080,
      options:
          const ChannelOptions(credentials: ChannelCredentials.insecure()));
  final stub = FileServiceClient(channel);

  final call = stub.getFiles(Empty());
  var count = 0;
  try {
    await for (var number in call) {
      count++;
      print(
          'Received file with ${number.contents.length} bytes (count=$count)');
      if (count > 5) {
        await call.cancel();
      }
    }
  } on GrpcError catch (e) {
    print('Caught: $e');
  } finally {
    channel.shutdown();
  }
  print('Final count: $count');
}
VerTiGoEtrex commented 1 year ago

I'm observing similar behavior, but not using arrays. My responses are certainly larger than 0x1000 bytes, though.

Some additional information that may be helpful for debugging: The request to stream resolves the headers (200 response) but no data comes through. I confirmed this by overriding dart's TimelineTask to watch events for each RPC.

...
flutter: instant Data received: <redacted>
flutter: instant Data received: <redacted>
flutter: instant Data received: <redacted>
flutter: instant Data received: <redacted>
flutter: instant Data received: <redacted>
flutter: instant Data received: <redacted>
flutter: instant Data received: <redacted>

// ^^ Stream was working fine beforehand
// This is where I cancel the RPC stream manually, same as your example

flutter: finish (0): {error: gRPC Error (code: 1, codeName: CANCELLED, message: Cancelled by client., details: null, rawResponse: null, trailers: {})}
flutter: start (0) gRPC Request
flutter: instant Request sent: <redacted>
flutter: instant Data sent: <redacted>
flutter: finish (1): null
flutter: start (0) gRPC Response: null
flutter: instant Metadata received: {headers: {:status: 200, content-type: application/grpc, strict-transport-security: max-age=63072000; includeSubDomains, trailer: Grpc-Status-Details-Bin, Grpc-Status, Grpc-Message, vary: Accept-Encoding, date: Fri, 12 May 2023 01:56:09 GMT}}

// Sits here forever until I manually cancel again

flutter: finish (0): {error: gRPC Error (code: 1, codeName: CANCELLED, message: Cancelled by client., details: null, rawResponse: null, trailers: {})}
flutter: Error processing EM stream: gRPC Error (code: 1, codeName: CANCELLED, message: Cancelled by client., details: null, rawResponse: null, trailers: {}). Invalidating self
flutter: invalidating self to allow re-subscribe.
flutter: start (0) gRPC Request
flutter: instant Request sent: <redacted>
flutter: instant Data sent: <redacted>
flutter: finish (1): null
flutter: start (0) gRPC Response: null
flutter: instant Metadata received: {headers: {:status: 200, content-type: application/grpc, strict-transport-security: max-age=63072000; includeSubDomains, trailer: Grpc-Status, Grpc-Message, Grpc-Status-Details-Bin, vary: Accept-Encoding, date: Fri, 12 May 2023 01:56:10 GMT}}

// Hangs again, repeat ad infinitum.

Given that I get headers, it's clear the connection is still live. It's weird, then, that no data comes through.

Pretty big problem for my current use case :(

Awesome work with the minimal repro, @ruicraveiro .

ruicraveiro commented 1 year ago

@VerTiGoEtrex Thanks!

mosuem commented 1 year ago

I am looking into this. It does seem weird, wireshark doesn't record a 200 OK response coming from the server after the client sends the headers. The Dart server code does say it sends it though, so I fear the issue is a bit deeper.

mosuem commented 1 year ago

Okay, so sometimes the server actually does send the 200 OK header acknowledgement. In any case, it then doesn't send the data, even if _stream.sendData(frame(bytes, _callEncodingCodec)); in lib/src/server/handler.dart is actually called...

VerTiGoEtrex commented 1 year ago

@mosuem note this bug occurs with the golang server as well, but only with the dart client. (and in the original report, I think @ruicraveiro mentions the ASP.net server has this issue as well, also only with the dart client)

So I wager this is a dart client bug, not a dart server bug.

mosuem commented 1 year ago

Thanks @VerTiGoEtrex! That also makes me believe the client is the issue thanks to the tests from @ruicraveiro , sorry if there has been a misunderstanding.

VerTiGoEtrex commented 1 year ago

Cool, probably my own miscommunication. Seems like we're on the same page.

Thanks for taking a look at this - it's a pretty important bug for some work I'm doing currently :)

mosuem commented 1 year ago

So the issue seems to be with cancelling. The client calls await fileStream.cancel(); which terminates the stream immediately, ignoring the DATA frame the server is still sending. This leads to the connection window size not being updated on the client side, so the server is blocked and waiting for a WINDOW_UPDATE which will never come as the client didn't realize it had to send one.

The solution is probably a more graceful cancelling, where canceling the stream still allows for the incoming frames to be handled as per the specs:

However, after sending the RST_STREAM, the sending endpoint MUST be prepared to receive and process additional frames sent on the stream that might have been sent by the peer prior to the arrival of the RST_STREAM.

mosuem commented 1 year ago

This should work as soon as package:http2 is published, or by using a dependency override on the git version in the meantime.