istio / istio

Connect, secure, control, and observe services.
https://istio.io
Apache License 2.0
35.62k stars 7.67k forks source link

random gRPC Error:13 INTERNAL: Received RST_STREAM with code 0 #50244

Open ianebot opened 4 months ago

ianebot commented 4 months ago

Is this the right place to submit this?

Bug Description

Hi community,

we have seen our gRPC client getting _Error:13 INTERNAL: Received RSTSTREAM with code 0 randomly when retriving data from our gRPC Go server. I could see that the data was deliveried to the client, but something strange happened between the Istio/envoy and the client, and the client ended up reporting the error 13. What I can confirm:

Also a test of 100 requests in total with 10 requests in parallels is enough to display 2-6 failed (random) request from 100 requests with error 13 INTERNAL: Received RST_STREAM with code 0 I did not see the issue running a load testing where the calls were serie.

Currently the only workaround I found out it was:

I reported this issue to the grpc-js community: https://github.com/grpc/grpc-node/issues/2569#issuecomment-1883015207 And after a long disscussion, the main developer suggested that this issue may/might be caused by Istio and/or envoy.
I reported the issue to the envoy community in a potentially similar issue: https://github.com/envoyproxy/envoy/issues/30149

But I wonder if anyone in the Istio community has experienced the same issue (I could not find any similar open issue) or has idea to try out something to mitigate it?

Any help would be much appreciated. Thanks.

Version

$ istioctl version
client version: 1.21.0
control plane version: 1.20.0
data plane version: 1.20.0

$ kubectl version 
Client Version: v1.29.3
Kustomize Version: v5.0.4
Server Version: v1.28.3

Additional Information

No response

morning810 commented 4 months ago

This error is not code error. I have already experience in versioned transaction transfer. But in case of transfer success ratio is high, such this error are not occured. When rpc road is high or Radium is unstable, this error often was ocurred.

morning810 commented 4 months ago

I think the solution of this error is to change RPC.

ianebot commented 4 months ago

Hi @morning810 ,

But in case of transfer success ratio is high, such this error are not occured. When rpc road is high or Radium is unstable, this error often was ocurred.

I faced this issue only when there were concurrent requests.

I think the solution of this error is to change RPC.

What do you exactly mean?

What I found out while testing is the fact that the server will actually send the data correctly, and while finishing it find that the client has not already half-closed, and the server is sending a reset.

Here another explanation: https://github.com/grpc/grpc-node/issues/2569#issuecomment-1959953844

So the server sends the data and trailers, and a RST_STREAM to the Envoy proxy, and that the Envoy proxy sends the RST_STREAM to the client. If the Envoy proxy does not send the trailers, then that is the problem. This would match up with the information in https://github.com/grpc/grpc-node/issues/2569#issuecomment-1899554032, which points out that the existing Envoy bug https://github.com/envoyproxy/envoy/issues/30149 shows a situation in which Envoy does not pass along trailers.

So I wonder if anyone has faced something similar?

Cheers.

morning810 commented 4 months ago

Thanks for your

Hi @morning810 ,

But in case of transfer success ratio is high, such this error are not occured. When rpc road is high or Radium is unstable, this error often was ocurred.

I faced this issue only when there are concurrent requests.

I think the solution of this error is to change RPC.

What do you exactly mean?

What I found out while testing is the fact that the server will actually send the data correctly, and while finishing it find that the client has not already half-closed, and the server is sending a reset.

Here another explanation: grpc/grpc-node#2569 (comment)

So the server sends the data and trailers, and a RST_STREAM to the Envoy proxy, and that the Envoy proxy sends the RST_STREAM to the client. If the Envoy proxy does not send the trailers, then that is the problem. This would match up with the information in grpc/grpc-node#2569 (comment), which points out that the existing Envoy bug envoyproxy/envoy#30149 shows a situation in which Envoy does not pass along trailers.

So I wonder if anyone has faced something similar?

Cheers.

Thanks for your replication

howardjohn commented 1 month ago

It looks like this is an upstream Envoy bug fixed in https://github.com/envoyproxy/envoy/pull/34461. When that lands Istio will automatically pick it up in its next version.

howardjohn commented 6 days ago

Looks like the upstream PR is making progress but is still pending