higherkindness / mu-scala

Mu is a purely functional library for building RPC endpoint based services with support for RPC and HTTP/2
Apache License 2.0
333 stars 37 forks source link

Broken gRPC calls with Istio Service Mesh and Envoy #1237

Open paoloyx opened 3 years ago

paoloyx commented 3 years ago

Hi everyone,

I don't really know if here is the right place to ask, but I'll try anyway. For the sake of clarity, I've asked the same on Envoy GitHub as I don't really know where the problem lies: https://github.com/envoyproxy/envoy/issues/16811

A bit of context

We've an Istio 1.8.3 installation on a 1.18.x Kubernetes clusters. There are two pods (let's say ORIGIN and DEST) where ORIGIN is making a gRPC call to DEST, through a Kubernetes service that is proxying the DEST pod.

The situation is this:

Here are the logs, one for successful communication (without sidecars) and one for bad communication (with sidecars)

Successful communications logs

ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:44.417 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] OUTBOUND SETTINGS: ack=false settings={ENABLE_PUSH=0, MAX_CONCURRENT_STREAMS=0, INITIAL_WINDOW_SIZE=1048576, MAX_HEADER_LIST_SIZE=8192}                                                                                                   
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:44.430 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] OUTBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=983041                                                                                                 
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:44.515 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] INBOUND SETTINGS: ack=false settings=[id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100]                                                                                                 
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:44.516 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] OUTBOUND SETTINGS: ack=true                                                                                                   
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:44.535 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] INBOUND SETTINGS: ack=true                                                                                                    
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:44.542 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] OUTBOUND HEADERS: streamId=3 headers=GrpcHttp2OutboundHeaders[:authority: dest-service:9100, :path: /x.y.z/Create, :method: POST, :scheme: http, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.36.1, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false                                                                                                 
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:44.551 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] OUTBOUND DATA: streamId=3 padding=0 endStream=true length=50 bytes=000000002d0a2439383937636436392d646636632d346362612d386631392d30613562316332613237663412057465737439                                                                                                   
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:44.718 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] INBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=9934465                                                                                                 
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:45.500 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] INBOUND HEADERS: streamId=3 headers=GrpcHttp2ResponseHeaders[:status: 200, content-type: application/grpc+proto, grpc-encoding: gzip, date: Thu, 03 Jun 2021 16:18:45 GMT] padding=0 endStream=false                                                                                                  
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:45.512 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] INBOUND DATA: streamId=3 padding=0 endStream=false length=89 bytes=01000000541f8b08000000000000000dc8bb1580201004c0dcd01a48f73de0be9473cad27f093ae15c6df98c71b6224c079429a8ca05299fbd18215177eb41ef...                                                                                                    
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:45.512 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] OUTBOUND PING: ack=false bytes=1234                                                                                                   
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:45.521 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] INBOUND HEADERS: streamId=3 headers=GrpcHttp2ResponseHeaders[grpc-status: 0, date: Thu, 03 Jun 2021 16:18:45 GMT] padding=0 endStream=true                                                                                                    
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:45.530 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] INBOUND PING: ack=true bytes=1234                                                                                                 
ORIGIN-POD ORIGIN-POD INFO  2021-06-03 16:18:45.542 [ioapp-compute-0] i.r.c.Server$ - [APPLICATION_LOGS_BLAH_BLAH]                                                                                              
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:45.552 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] OUTBOUND GO_AWAY: lastStreamId=0 errorCode=0 length=0 bytes=                                                                                                  

Bad Communication logs

ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:30.427 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] OUTBOUND SETTINGS: ack=false settings={ENABLE_PUSH=0, MAX_CONCURRENT_STREAMS=0, INITIAL_WINDOW_SIZE=1048576, MAX_HEADER_LIST_SIZE=8192}                                                                                                   
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:30.475 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] OUTBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=983041                                                                                                 
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:30.487 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] INBOUND SETTINGS: ack=false settings={HEADER_TABLE_SIZE=4096, MAX_CONCURRENT_STREAMS=2147483647, INITIAL_WINDOW_SIZE=268435456,=0}                                                                                                    
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:30.490 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] OUTBOUND SETTINGS: ack=true                                                                                                   
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:30.511 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] INBOUND SETTINGS: ack=true                                                                                                    
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:30.512 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] INBOUND WINDOW_UPDATE: streamId=0 windowSizeIncrement=268369921                                                                                                   
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:30.521 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] OUTBOUND HEADERS: streamId=3 headers=GrpcHttp2OutboundHeaders[:authority: dest-service:9100, :path: /x.y.z/Create, :method: POST, :scheme: http, content-type: application/grpc, te: trailers, user-agent: grpc-java-netty/1.36.1, grpc-accept-encoding: gzip] streamDependency=0 weight=16 exclusive=false padding=0 endStream=false                                                                                                 
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:30.533 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] OUTBOUND DATA: streamId=3 padding=0 endStream=true length=50 bytes=000000002d0a2439383937636436392d646636632d346362612d386631392d30613562316332613237663412057465737439                                                                                                                                                                                           
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:31.071 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] INBOUND HEADERS: streamId=3 headers=GrpcHttp2ResponseHeaders[:status: 200, grpc-encoding: gzip, date: Thu, 03 Jun 2021 15:13:31 GMT, content-type: application/grpc+proto, x-envoy-upstream-service-time: 529, server: envoy] padding=0 endStream=false                                                                                                   
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:31.113 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] INBOUND DATA: streamId=3 padding=0 endStream=false length=89 bytes=01000000541f8b08000000000000000dc9b911c0300804c0dca16b50ca0c423ca21c71c6fd97606fbad710e6ee9a2099aca406a1b4d8a4484b632b71dc2375c9...                                                                                                    
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:31.114 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] OUTBOUND PING: ack=false bytes=1234                                                                                                   
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:31.125 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] INBOUND DATA: streamId=3 padding=0 endStream=true length=0 bytes=                                                                                                 
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:31.129 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] INBOUND PING: ack=true bytes=1234                                                                                                 
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:31.160 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] OUTBOUND GO_AWAY: lastStreamId=0 errorCode=0 length=0 bytes=
[APPLICATION-LOGS_BLAH_BLAH]                                                                                            
ORIGIN-POD ORIGIN-POD io.grpc.StatusRuntimeException: INTERNAL: Received unexpected EOS on empty DATA frame from server                                                                                                 
ORIGIN-POD ORIGIN-POD   at io.grpc.Status.asRuntimeException(Status.java:535)                                                                                               
ORIGIN-POD ORIGIN-POD   at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533)                                                                                               
ORIGIN-POD ORIGIN-POD   at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:464)                                                                                             
ORIGIN-POD ORIGIN-POD   at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:428)                                                                                                
ORIGIN-POD ORIGIN-POD   at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:461)                                                                                               
ORIGIN-POD ORIGIN-POD   at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)                                                                                               
ORIGIN-POD ORIGIN-POD   at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)                                                                                               
ORIGIN-POD ORIGIN-POD   at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)                                                                                              
ORIGIN-POD ORIGIN-POD   at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)                                                                                             
ORIGIN-POD ORIGIN-POD   at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)                                                                                                
ORIGIN-POD ORIGIN-POD   at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)   

Differences in logs

Unfortunately I'm no expert in debugging gGPC services, and There's one thing that i clearly see:

ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 16:18:45.512 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x0a2ce423, L:/10.172.22.165:60318 - R:dest-service/10.175.246.61:9100] INBOUND DATA: streamId=3 padding=0 endStream=false length=89 bytes=01000000541f8b08000000000000000dc8bb1580201004c0dcd01a48f73de0be9473cad27f093ae15c6df98c71b6224c079429a8ca05299fbd18215177eb41ef...    
ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:31.113 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] INBOUND DATA: streamId=3 padding=0 endStream=false length=89 bytes=01000000541f8b08000000000000000dc9b911c0300804c0dca16b50ca0c423ca21c71c6fd97606fbad710e6ee9a2099aca406a1b4d8a4484b632b71dc2375c9...                                                                                                    

ORIGIN-POD ORIGIN-POD DEBUG 2021-06-03 15:13:31.125 [grpc-nio-worker-ELG-1-1] i.g.n.NettyClientHandler - [id: 0x600c3d09, L:/10.172.22.162:33746 - R:dest-service/10.175.246.61:9100] INBOUND DATA: streamId=3 padding=0 endStream=true length=0 bytes= 

and the last one has length=0 bytes, so I suppose that this is giving the Received unexpected EOS on empty DATA frame from server error

Envoy logs

Just to add an important info, here are the Envoy proxies logs for above communication

ORIGIN pod

[2021-05-28T15:32:52.440Z] "POST /x.y.z/Create HTTP/2" 200 - via_upstream - "-" 50 89 534 529 "-" "grpc-java-netty/1.36.1" 2b95ea42-d089-4ae5-9d45-b13dba9e983f"  "dest-service:9100" 10.172.24.7:9000 outbound|9100||dest-service.default.svc.cluster.local 10.172.22.162:35838 10.175.246.61:9100 10.172.22.162:33746 - default

DEST pod

[2021-05-28T15:32:52.442Z]  "POST /x.y.z/Create HTTP/2"  200  -  via_upstream  -  "-"  0  135  643  640  "-"  "grpc-java-netty/1.36.1"  "96fa90ba-ab5e-4d03-818e-c762f5 │  │ 92aeb1"  "dest-service:9100"  "127.0.0.1:9000"  inbound|9000||  127.0.0.1:45170  10.172.24.7:9000  10.172.22.162:35838  outbound_.9100_._.dest-service.default.svc.cluster.local  default

Hope that this is enough to let someone of you understand what my problem is. Thanks

mvazquezrius commented 6 months ago

Did you figure it out?