grpc / grpc-web

gRPC for Web Clients
https://grpc.io
Apache License 2.0
8.62k stars 765 forks source link

Text mode stops delivering data for large streaming calls #952

Open AudriusButkevicius opened 4 years ago

AudriusButkevicius commented 4 years ago

I have an application that uses a server streaming call which sends back around 800mb of data to the client (206 onNext calls ~4mb each), given text/base64 encoding this equates to roughly 1.1GB on the wire.

When the stream starts, I can see the HTTP call in chrome developer tools, and the data returned by the endpoint grow in 4mb chunks over time, spending some time between chunks (presumably doing deserialization and handling my client code).

Roughly half way through, around the 300-400mb mark, it seems the generated client stops calling on('data'), and I get no further callbacks.

The rate at which chunks arrive/grow in chrome developer tools continues growing but at a much faster rate (as if it's not doing deserialziation anymore), and after all of the data arrives (some 10s later), I get an on('end') callback.

It seems something goes wrong in the client/it hits some sort of limit which breaks the mechanism used to call back to the caller.

There is no error/status/metadata callback to explain this.

For curiosity sake, I tried removing all of my code processing the data, and simply count the number of on('data') calls, and it seems I could get 96 of those calls (without doing anything with the data in the call) equating roughly 370mb of raw data. I can still observe the rate of arrival accelerating, somewhat confirming that serialisation stops happening.

Lastly, on the text client, I tried monkey patching the format flag to use binary format (in the text client).

I know binary format does not support server side streaming, but it does seem to work, it's just the behaviour is different. I get all of the data from the server side first, and only then it starts getting on('data') callbacks dispatched. It seems in this mode, I do get all of the data and nothing is lost.

Any ideas what this could be? I can see that all of the data arrives to the browser from the network tab, so it's definately something with the client code.

It does not seem to be time related, but rather size related. I wonder if large data growth triggers some sort of GC which breaks things.

The boundary on which it stops sending data is also not deterministic, sometimes I get 95 RPCs done, sometimes 96, even though the data coming back is deterministic/fixed size.

Also, any ideas if there is any form of support for transport level compression when using envoy as the translation layer?

thoralt commented 4 years ago

571, #714, #920, #949 and #952 all more or less describe the same problem. This bug is quite severe, since it reliably prevents streaming of large amounts of data. All of these bugs were either left unanswered or commented like "use https://github.com/improbable-eng/grpc-web instead". Our current project heavily relies on streaming and I now have no other choice than going the improbable-eng/grpc-web road as well.

If any of the developers needs help to reproduce the behaviour I am willing to support. Unfortunately, grpc-web is too complex for me to find and fix the bug myself so we depend on the developers to pick it up.

vaporz commented 3 years ago

I'm having the same issue, CPU usage increases over time, on FireFox. In my case, in less than 5 minutes, CPU usage increases from 15% to 80%. On chrome, not only CPU usage, but also memory usage increases over time.

vaporz commented 3 years ago

Unfortunately, it's a blocking issue to me, maybe I should also transfer to https://github.com/improbable-eng/grpc-web.

dimo414 commented 7 months ago

@sampajano just curious if you have any context for this bug, is this still an issue?

sampajano commented 7 months ago

@dimo414 Hi! No i have no context here. Would be helpful to have a reproduce case to identify if this is indeed a client bug.

One potential related thread is this one below:

https://github.com/grpc/grpc-web/issues/587

If this bug was a limitation with our XHR implementation, then with the new option of Fetch/streams, the issue could potentially be mitigated.