grpc / grpc-node

gRPC for Node.js
https://grpc.io
Apache License 2.0
4.51k stars 651 forks source link

@grpc/grpc-js not working with aws ALB #2093

Open ghost opened 2 years ago

ghost commented 2 years ago

Problem description

I was trying to use grpc with ALB, considering this documentation from AWS: https://aws.amazon.com/pt/blogs/aws/new-application-load-balancer-support-for-end-to-end-http-2-and-grpc/

I created the Load balancer, target group, cluster, service and task definition using an ECR image with this hello-world server example from grpc repo. I tried dynamic and static proto generation and both returned this error:

Error: 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: Protocol error
(node:1500234) UnhandledPromiseRejectionWarning: Error: 13 INTERNAL: Received RST_STREAM with code 2 triggered by internal client error: Protocol error
    at Object.callErrorFromStatus (/home/gustavosartori/validate-grpc-alb/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/home/gustavosartori/validate-grpc-alb/node_modules/@grpc/grpc-js/build/src/client.js:180:52)
    at Object.onReceiveStatus (/home/gustavosartori/validate-grpc-alb/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141)
    at Object.onReceiveStatus (/home/gustavosartori/validate-grpc-alb/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181)
    at /home/gustavosartori/validate-grpc-alb/node_modules/@grpc/grpc-js/build/src/call-stream.js:160:78
    at processTicksAndRejections (internal/process/task_queues.js:77:11)

After that to make sure the error wasn't in my hello-world server I changed the ECR image to another grpc-server(using @grpc/grpc-jt) that I already have working on aws, and the error persists.

To finish I decided to test using the python hello world example, and it worked, so I believe the @grpc/grpc-js have some bad interaction with ALB, but I don't know exactly how debug this

Reproduction steps

Environment

Additional context

I tried generate a cert with openssl in order to no use grcp.createInsecure, but it changed nothin

murgatroid99 commented 2 years ago

Do you think you could get a wireshark-compatible TCP dump of the interaction that results in this error (tcpdump -i <interface> -w <file> 'tcp port 50051')? That might give us a better idea of what's happening.

ghost commented 2 years ago

Do you think you could get a wireshark-compatible TCP dump of the interaction that results in this error (tcpdump -i <interface> -w <file> 'tcp port 50051')? That might give us a better idea of what's happening.

I need to run this command inside my container on aws, right?

murgatroid99 commented 2 years ago

Actually, I think a dump from the client would be more useful. Since the problem happens with different gRPC servers, it seems to be a problem with the client talking to the ALB. I want to see exactly what happens in that interaction.

ghost commented 2 years ago

Sorry for the delay, I already had deleted all my tests from aws so needed to create everything again.

I ran this command (sudo tcpdump -i any -w grpcTest2.pcap 'tcp port 50051') and the output is below:


reading from file grpcTest2.pcap, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144
Warning: interface names might be incorrect
18:24:23.934713 tun0  Out IP 192.168.4.94.44410 > 172.19.157.134.50051: Flags [S], seq 1622324440, win 64240, options [mss 1460,sackOK,TS val 1990293701 ecr 0,nop,wscale 7], length 0
18:24:23.934868 tun0  Out IP 192.168.4.94.36618 > 172.19.166.185.50051: Flags [S], seq 839609468, win 64240, options [mss 1460,sackOK,TS val 4054007907 ecr 0,nop,wscale 7], length 0
18:24:23.934912 tun0  Out IP 192.168.4.94.51836 > 172.19.136.134.50051: Flags [S], seq 2983912830, win 64240, options [mss 1460,sackOK,TS val 3771253667 ecr 0,nop,wscale 7], length 0
18:24:23.948563 tun0  In  IP 172.19.166.185.50051 > 192.168.4.94.36618: Flags [S.], seq 26724806, ack 839609469, win 26847, options [mss 1300,sackOK,TS val 1035771978 ecr 4054007907,nop,wscale 8], length 0
18:24:23.948593 tun0  Out IP 192.168.4.94.36618 > 172.19.166.185.50051: Flags [.], ack 1, win 502, options [nop,nop,TS val 4054007921 ecr 1035771978], length 0
18:24:23.949126 tun0  In  IP 172.19.136.134.50051 > 192.168.4.94.51836: Flags [S.], seq 2439500419, ack 2983912831, win 26847, options [mss 1300,sackOK,TS val 4016289117 ecr 3771253667,nop,wscale 8], length 0
18:24:23.949150 tun0  Out IP 192.168.4.94.51836 > 172.19.136.134.50051: Flags [.], ack 1, win 502, options [nop,nop,TS val 3771253681 ecr 4016289117], length 0
18:24:23.949393 tun0  In  IP 172.19.157.134.50051 > 192.168.4.94.44410: Flags [S.], seq 2776792086, ack 1622324441, win 26847, options [mss 1300,sackOK,TS val 395158550 ecr 1990293701,nop,wscale 8], length 0
18:24:23.949408 tun0  Out IP 192.168.4.94.44410 > 172.19.157.134.50051: Flags [.], ack 1, win 502, options [nop,nop,TS val 1990293715 ecr 395158550], length 0
18:24:23.955725 tun0  Out IP 192.168.4.94.36618 > 172.19.166.185.50051: Flags [P.], seq 1:753, ack 1, win 502, options [nop,nop,TS val 4054007928 ecr 1035771978], length 752
18:24:23.956548 tun0  Out IP 192.168.4.94.51836 > 172.19.136.134.50051: Flags [P.], seq 1:34, ack 1, win 502, options [nop,nop,TS val 3771253689 ecr 4016289117], length 33
18:24:23.956573 tun0  Out IP 192.168.4.94.44410 > 172.19.157.134.50051: Flags [P.], seq 1:34, ack 1, win 502, options [nop,nop,TS val 1990293723 ecr 395158550], length 33
18:24:23.968745 tun0  In  IP 172.19.166.185.50051 > 192.168.4.94.36618: Flags [.], ack 753, win 111, options [nop,nop,TS val 1035771998 ecr 4054007928], length 0
18:24:23.968772 tun0  In  IP 172.19.166.185.50051 > 192.168.4.94.36618: Flags [P.], seq 1:273, ack 753, win 111, options [nop,nop,TS val 1035771998 ecr 4054007928], length 272
18:24:23.968784 tun0  Out IP 192.168.4.94.36618 > 172.19.166.185.50051: Flags [.], ack 273, win 501, options [nop,nop,TS val 4054007941 ecr 1035771998], length 0
18:24:23.968803 tun0  In  IP 172.19.166.185.50051 > 192.168.4.94.36618: Flags [F.], seq 273, ack 753, win 111, options [nop,nop,TS val 1035771998 ecr 4054007928], length 0
18:24:23.969325 tun0  In  IP 172.19.136.134.50051 > 192.168.4.94.51836: Flags [.], ack 34, win 105, options [nop,nop,TS val 4016289137 ecr 3771253689], length 0
18:24:23.969405 tun0  In  IP 172.19.136.134.50051 > 192.168.4.94.51836: Flags [P.], seq 1:273, ack 34, win 105, options [nop,nop,TS val 4016289138 ecr 3771253689], length 272
18:24:23.969415 tun0  Out IP 192.168.4.94.51836 > 172.19.136.134.50051: Flags [.], ack 273, win 501, options [nop,nop,TS val 3771253701 ecr 4016289138], length 0
18:24:23.969426 tun0  In  IP 172.19.136.134.50051 > 192.168.4.94.51836: Flags [F.], seq 273, ack 34, win 105, options [nop,nop,TS val 4016289138 ecr 3771253689], length 0
18:24:23.969441 tun0  In  IP 172.19.157.134.50051 > 192.168.4.94.44410: Flags [.], ack 34, win 105, options [nop,nop,TS val 395158570 ecr 1990293723], length 0
18:24:23.969673 tun0  In  IP 172.19.157.134.50051 > 192.168.4.94.44410: Flags [P.], seq 1:273, ack 34, win 105, options [nop,nop,TS val 395158570 ecr 1990293723], length 272
18:24:23.969685 tun0  Out IP 192.168.4.94.44410 > 172.19.157.134.50051: Flags [.], ack 273, win 501, options [nop,nop,TS val 1990293736 ecr 395158570], length 0
18:24:23.969698 tun0  In  IP 172.19.157.134.50051 > 192.168.4.94.44410: Flags [F.], seq 273, ack 34, win 105, options [nop,nop,TS val 395158570 ecr 1990293723], length 0
18:24:23.971705 tun0  Out IP 192.168.4.94.36618 > 172.19.166.185.50051: Flags [P.], seq 753:770, ack 274, win 501, options [nop,nop,TS val 4054007944 ecr 1035771998], length 17
18:24:23.976594 tun0  Out IP 192.168.4.94.36618 > 172.19.166.185.50051: Flags [F.], seq 770, ack 274, win 501, options [nop,nop,TS val 4054007949 ecr 1035771998], length 0
18:24:23.977917 tun0  Out IP 192.168.4.94.51836 > 172.19.136.134.50051: Flags [P.], seq 34:51, ack 274, win 501, options [nop,nop,TS val 3771253710 ecr 4016289138], length 17
18:24:23.978177 tun0  Out IP 192.168.4.94.44410 > 172.19.157.134.50051: Flags [P.], seq 34:51, ack 274, win 501, options [nop,nop,TS val 1990293744 ecr 395158570], length 17
18:24:23.978650 tun0  Out IP 192.168.4.94.51836 > 172.19.136.134.50051: Flags [F.], seq 51, ack 274, win 501, options [nop,nop,TS val 3771253711 ecr 4016289138], length 0
18:24:23.978889 tun0  Out IP 192.168.4.94.44410 > 172.19.157.134.50051: Flags [F.], seq 51, ack 274, win 501, options [nop,nop,TS val 1990293745 ecr 395158570], length 0
18:24:23.989075 tun0  In  IP 172.19.166.185.50051 > 192.168.4.94.36618: Flags [.], ack 771, win 111, options [nop,nop,TS val 1035772018 ecr 4054007944], length 0
18:24:23.992955 tun0  In  IP 172.19.136.134.50051 > 192.168.4.94.51836: Flags [.], ack 52, win 105, options [nop,nop,TS val 4016289161 ecr 3771253710], length 0
18:24:23.993007 tun0  In  IP 172.19.157.134.50051 > 192.168.4.94.44410: Flags [.], ack 52, win 105, options [nop,nop,TS val 395158594 ecr 1990293744], length 0```
murgatroid99 commented 2 years ago

Can you please share the raw pcap file? I would like to look at it in Wireshark to see how it parses the HTTP/2 data from the actual bytes that went over the wire.

ghost commented 2 years ago

https://drive.google.com/file/d/1E6OA0yhoNYvR1DaDQ6QHWSrmgnGlcGZK/view?usp=sharing

murgatroid99 commented 2 years ago

That capture shows that the server is responding with the error HTTP/1.1 400 Bad Request. Something is misconfigured and the front end your client is talking to isn't handling HTTP/2. And the client doesn't understand an HTTP/1.1 response to an HTTP/2 request, so that's why the error is "Protocol error".

ghost commented 2 years ago

Ok, somehow in my communication chain (client->alb->server->alb->client) the request is being "converted" to HTTP/1.1, right?

I didn't understand if the problem is in the communication between my client and the ALB or between the ALB and my server. Was you able to determine that in your data analysis?

murgatroid99 commented 2 years ago

You gave me a dump of the communication between the client and the ALB. It shows how the ALB responded to the request. It does not show anything about the ALB communicating with the server or anything about "converting" the request because those are not things that happen in the client to ALB communication path.

I also just took another look at that dump log, and I noticed that the client is making a request to the method /printnfse.PrintNFSe/gerarPDF. This does not match the behavior of the hello world client, and it is definitely not a method that is implemented in the server you say you uploaded to ECR. So I'm wondering if maybe there was some mixup here, and that mismatch was the cause of the problem.

I think the primary problem here is that the ALB can respond to an HTTP/2 request with an HTTP/1.1 response at all. I believe this is a bug in the ALB.

ghost commented 2 years ago

I didn't use the Hello World client for the dump log because when you requested it I already had deleted all my tests from AWS. So in order to make it easier I used other Grpc service that already was uploaded to ECR.

Now I understood the thing you said about the communication, but I don't think it is only an ALB problem because when I used a python client and server in my tests it worked.

I will recreate de scenario using the python Hello World and put the tcpdump here.

murgatroid99 commented 2 years ago

When you do that, can you please also capture a dump with the Node Hello World client so that we can do a 1-to-1 comparison?

cabulafhy commented 2 years ago

I've been looking into this with @g-sartori and we found the problem.

ALB ensures that only HTTPS listeners can foward requests to a gRPC target group, but we were consuming the server like it did'nt had any TLS/SSL in front of it. So we changed the way we created the client and it worked.

We were creating the client like this

var client = new hello_proto.Greeter(target, grpc.credentials.createInsecure())

And we changed to this

var client = new hello_proto.Greeter(target, grpc.credentials.createSsl())

In this documentation https://aws.amazon.com/pt/blogs/aws/new-application-load-balancer-support-for-end-to-end-http-2-and-grpc/ they do the same thing for the python hello-word example using credentials = grpc.ssl_channel_credentials()

amleshk66 commented 2 years ago

Hi There,

I am also facing the same issues with ALB, my Application is running on AWS ECS, and i have configured GRPS protocol in Target group, Health check is showing healthy but non of the request is going to the Application server.

Error: 400 Bad Request.

nickzelei commented 2 years ago

I've been looking into this with @g-sartori and we found the problem.

ALB ensures that only HTTPS listeners can foward requests to a gRPC target group, but we were consuming the server like it did'nt had any TLS/SSL in front of it. So we changed the way we created the client and it worked.

We were creating the client like this

var client = new hello_proto.Greeter(target, grpc.credentials.createInsecure())

And we changed to this

var client = new hello_proto.Greeter(target, grpc.credentials.createSsl())

In this documentation https://aws.amazon.com/pt/blogs/aws/new-application-load-balancer-support-for-end-to-end-http-2-and-grpc/ they do the same thing for the python hello-word example using credentials = grpc.ssl_channel_credentials()

I'm having the same issue and have been banging my head against the wall for far too long. I have a Golang-based gRPC service sitting inside of Kubernetes behind a load balancer. I am able to write a Golang-based client that can hit the server just fine using TLS.

When I write the node-based client using this library, I can hit the api server locally using insecure mode. But if I try to hit the remote API server using ssl, it drops the connection. Error: 14 UNAVAILABLE: Connection dropped

Both clients are effectively doing the same thing. Why does the connection get dropped for the node client?

cabulafhy commented 2 years ago

Hi There,

I am also facing the same issues with ALB, my Application is running on AWS ECS, and i have configured GRPS protocol in Target group, Health check is showing healthy but non of the request is going to the Application server.

Error: 400 Bad Request.

I've been looking into this with @g-sartori and we found the problem. ALB ensures that only HTTPS listeners can foward requests to a gRPC target group, but we were consuming the server like it did'nt had any TLS/SSL in front of it. So we changed the way we created the client and it worked. We were creating the client like this

var client = new hello_proto.Greeter(target, grpc.credentials.createInsecure())

And we changed to this

var client = new hello_proto.Greeter(target, grpc.credentials.createSsl())

In this documentation https://aws.amazon.com/pt/blogs/aws/new-application-load-balancer-support-for-end-to-end-http-2-and-grpc/ they do the same thing for the python hello-word example using credentials = grpc.ssl_channel_credentials()

I'm having the same issue and have been banging my head against the wall for far too long. I have a Golang-based gRPC service sitting inside of Kubernetes behind a load balancer. I am able to write a Golang-based client that can hit the server just fine using TLS.

When I write the node-based client using this library, I can hit the api server locally using insecure mode. But if I try to hit the remote API server using ssl, it drops the connection. Error: 14 UNAVAILABLE: Connection dropped

Both clients are effectively doing the same thing. Why does the connection get dropped for the node client?

It may be that you guys need to especify the ssl_target_name_override. In the cases that the url of the target being used does not match the certificate allowed domain name that is setup in the load balancer.

Here is an example:

return new hello_proto.Greeter(target, grpc.credentials.createSsl(),
    {
      'grpc.ssl_target_name_override': 'my-certificate.com'
    })

I had to do this when i used the direct dns name of the load balancer or a custom domain name setup in route53 that was direfent from the certificate domain name

nickzelei commented 2 years ago

Hm, I tried a variety of combinations with that override and still to no avail.

The main difference is that the URL I'm requesting is tied to a wildcard certificate. I tried overriding the target name to be the wildcard name, and a few other things, but still nothing.

URL example:

my-service.my-namespace.stage.example.com

Certificate:

  commonName: '*.my-namespace.stage.example.com'
  dnsNames:
  - '*.my-namespace.stage.example.com'

To further isolate this, I was able to call the remote service using grpcurl just fine.

grpcurl my-service.my-namespace.stage.example.com:443 MyService/MyMethod

Works without any issues, however, grpcurl is a golang based CLI as well.

nickzelei commented 2 years ago

Not sure if this is helpful, but here is a debug trace:

D 2022-07-23T21:14:03.625Z | index | Loading @grpc/grpc-js version 1.6.7
D 2022-07-23T21:14:03.634Z | resolving_load_balancer | dns:my-service.my-namespace.stage.my-domain.com:443 IDLE -> IDLE
D 2022-07-23T21:14:03.634Z | connectivity_state | (1) dns:my-service.my-namespace.stage.my-domain.com:443 IDLE -> IDLE
D 2022-07-23T21:14:03.634Z | dns_resolver | Resolver constructed for target dns:my-service.my-namespace.stage.my-domain.com:443
D 2022-07-23T21:14:03.635Z | channel | (1) dns:my-service.my-namespace.stage.my-domain.com:443 Channel constructed with options {}
D 2022-07-23T21:14:03.635Z | channel_stacktrace | (1) Channel constructed 
    at new ChannelImplementation (/Users/nick/code/projects/app/node_modules/@grpc/grpc-js/build/src/channel.js:202:23)
    at new Client (/Users/nick/code/projects/app/node_modules/@grpc/grpc-js/build/src/client.js:62:36)
    at getRpcConnection (/Users/nick/code/projects/app/test.js:13:22)
    at /Users/nick/code/projects/app/test.js:79:22
    at Object.<anonymous> (/Users/nick/code/projects/app/test.js:92:3)
    at Module._compile (node:internal/modules/cjs/loader:1105:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
CliService/ListServices
D 2022-07-23T21:14:03.636Z | channel | (1) dns:my-service.my-namespace.stage.my-domain.com:443 createCall [0] method="CliService/ListServices", deadline=Infinity
D 2022-07-23T21:14:03.636Z | call_stream | [0] Sending metadata
D 2022-07-23T21:14:03.636Z | dns_resolver | Looking up DNS hostname my-service.my-namespace.stage.my-domain.com
D 2022-07-23T21:14:03.637Z | resolving_load_balancer | dns:my-service.my-namespace.stage.my-domain.com:443 IDLE -> CONNECTING
D 2022-07-23T21:14:03.637Z | connectivity_state | (1) dns:my-service.my-namespace.stage.my-domain.com:443 IDLE -> CONNECTING
D 2022-07-23T21:14:03.638Z | channel | (1) dns:my-service.my-namespace.stage.my-domain.com:443 callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-07-23T21:14:03.638Z | call_stream | [0] write() called with message of length 7
D 2022-07-23T21:14:03.638Z | call_stream | [0] end() called
D 2022-07-23T21:14:03.638Z | call_stream | [0] deferring writing data chunk of length 12
D 2022-07-23T21:14:03.641Z | dns_resolver | Resolved addresses for target dns:my-service.my-namespace.stage.my-domain.com:443: [<REDACTED_IP>:443]
D 2022-07-23T21:14:03.641Z | pick_first | Connect to address list <REDACTED_IP>:443
D 2022-07-23T21:14:03.641Z | subchannel | (2) <REDACTED_IP>:443 Subchannel constructed with options {}
D 2022-07-23T21:14:03.641Z | subchannel_refcount | (2) <REDACTED_IP>:443 refcount 0 -> 1
D 2022-07-23T21:14:03.642Z | subchannel_refcount | (2) <REDACTED_IP>:443 refcount 1 -> 2
D 2022-07-23T21:14:03.642Z | pick_first | Start connecting to subchannel with address <REDACTED_IP>:443
D 2022-07-23T21:14:03.642Z | pick_first | IDLE -> CONNECTING
D 2022-07-23T21:14:03.642Z | resolving_load_balancer | dns:my-service.my-namespace.stage.my-domain.com:443 CONNECTING -> CONNECTING
D 2022-07-23T21:14:03.642Z | channel | (1) dns:my-service.my-namespace.stage.my-domain.com:443 callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
D 2022-07-23T21:14:03.642Z | connectivity_state | (1) dns:my-service.my-namespace.stage.my-domain.com:443 CONNECTING -> CONNECTING
D 2022-07-23T21:14:03.642Z | subchannel | (2) <REDACTED_IP>:443 IDLE -> CONNECTING
D 2022-07-23T21:14:03.642Z | pick_first | CONNECTING -> CONNECTING
D 2022-07-23T21:14:03.642Z | resolving_load_balancer | dns:my-service.my-namespace.stage.my-domain.com:443 CONNECTING -> CONNECTING
D 2022-07-23T21:14:03.642Z | connectivity_state | (1) dns:my-service.my-namespace.stage.my-domain.com:443 CONNECTING -> CONNECTING
D 2022-07-23T21:14:03.643Z | channel | (1) dns:my-service.my-namespace.stage.my-domain.com:443 Pick result for call [0]: QUEUE subchannel: null status: undefined undefined
D 2022-07-23T21:14:03.643Z | channel | (1) dns:my-service.my-namespace.stage.my-domain.com:443 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2022-07-23T21:14:03.643Z | subchannel | (2) <REDACTED_IP>:443 creating HTTP/2 session
D 2022-07-23T21:14:03.714Z | subchannel | (2) <REDACTED_IP>:443 CONNECTING -> READY
D 2022-07-23T21:14:03.714Z | pick_first | Pick subchannel with address <REDACTED_IP>:443
D 2022-07-23T21:14:03.714Z | pick_first | CONNECTING -> READY
D 2022-07-23T21:14:03.714Z | resolving_load_balancer | dns:my-service.my-namespace.stage.my-domain.com:443 CONNECTING -> READY
D 2022-07-23T21:14:03.714Z | channel | (1) dns:my-service.my-namespace.stage.my-domain.com:443 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
D 2022-07-23T21:14:03.714Z | channel | (1) dns:my-service.my-namespace.stage.my-domain.com:443 Pick result for call [0]: COMPLETE subchannel: (2) <REDACTED_IP>:443 status: undefined undefined
D 2022-07-23T21:14:03.714Z | connectivity_state | (1) dns:my-service.my-namespace.stage.my-domain.com:443 CONNECTING -> READY
D 2022-07-23T21:14:03.715Z | subchannel_refcount | (2) <REDACTED_IP>:443 refcount 2 -> 3
D 2022-07-23T21:14:03.715Z | subchannel_refcount | (2) <REDACTED_IP>:443 refcount 3 -> 2
D 2022-07-23T21:14:03.716Z | call_stream | Starting stream [0] on subchannel (2) <REDACTED_IP>:443 with headers
                authorization: bearer <REDACTED_JWT_TOKEN>
                grpc-accept-encoding: identity,deflate,gzip
                accept-encoding: identity
                :authority: my-service.my-namespace.stage.my-domain.com:443
                user-agent: grpc-node-js/1.6.7
                content-type: application/grpc
                :method: POST
                :path: CliService/ListServices
                te: trailers

D 2022-07-23T21:14:03.716Z | subchannel_flowctrl | (2) <REDACTED_IP>:443 local window size: 65535 remote window size: 65535
D 2022-07-23T21:14:03.716Z | subchannel_internals | (2) <REDACTED_IP>:443 session.closed=false session.destroyed=false session.socket.destroyed=false
D 2022-07-23T21:14:03.716Z | call_stream | [0] attachHttp2Stream from subchannel <REDACTED_IP>:443
D 2022-07-23T21:14:03.716Z | subchannel_refcount | (2) <REDACTED_IP>:443 callRefcount 0 -> 1
D 2022-07-23T21:14:03.716Z | call_stream | [0] sending data chunk of length 12 (deferred)
D 2022-07-23T21:14:03.716Z | call_stream | [0] calling end() on HTTP/2 stream
D 2022-07-23T21:14:04.638Z | resolving_load_balancer | dns:my-service.my-namespace.stage.my-domain.com:443 READY -> READY
D 2022-07-23T21:14:04.638Z | connectivity_state | (1) dns:my-service.my-namespace.stage.my-domain.com:443 READY -> READY
D 2022-07-23T21:14:04.747Z | subchannel | (2) <REDACTED_IP>:443 READY -> TRANSIENT_FAILURE
D 2022-07-23T21:14:04.747Z | subchannel_refcount | (2) <REDACTED_IP>:443 refcount 2 -> 1
D 2022-07-23T21:14:04.747Z | pick_first | READY -> IDLE
D 2022-07-23T21:14:04.747Z | resolving_load_balancer | dns:my-service.my-namespace.stage.my-domain.com:443 READY -> IDLE
D 2022-07-23T21:14:04.747Z | connectivity_state | (1) dns:my-service.my-namespace.stage.my-domain.com:443 READY -> IDLE
D 2022-07-23T21:14:04.747Z | call_stream | [0] ended with status: code=14 details="Connection dropped"
D 2022-07-23T21:14:04.748Z | subchannel_refcount | (2) <REDACTED_IP>:443 callRefcount 1 -> 0
D 2022-07-23T21:14:04.748Z | call_stream | [0] close http2 stream with code 8
D 2022-07-23T21:14:04.748Z | subchannel | (2) <REDACTED_IP>:443 TRANSIENT_FAILURE -> IDLE
Error: 14 UNAVAILABLE: Connection dropped
    at Object.callErrorFromStatus (/Users/nick/code/projects/app/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/Users/nick/code/projects/app/node_modules/@grpc/grpc-js/build/src/client.js:189:52)
    at Object.onReceiveStatus (/Users/nick/code/projects/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:365:141)
    at Object.onReceiveStatus (/Users/nick/code/projects/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:328:181)
    at /Users/nick/code/projects/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:187:78
    at processTicksAndRejections (node:internal/process/task_queues:78:11) {
  code: 14,
  details: 'Connection dropped',
  metadata: Metadata { internalRepr: Map(0) {}, options: {} }
}
D 2022-07-23T21:14:04.749Z | call_stream | [0] HTTP/2 stream closed with code 8
wtfiwtz commented 2 years ago

If you are using Node.js server-side grpc-web, then it defaults to HTTP/2. However, AWS ALB can be configured for HTTP/1.1 https://stackoverflow.com/questions/65233710/fetch-in-node-receiving-status-code-464-but-working-in-browser

Setting GRPC_VERBOSITY=debug GRPC_TRACE=all I can see this:

D 2022-11-29T22:20:27.014Z | subchannel | (3) 54.253.104.5:443 local settings acknowledged by remote: {"headerTableSize":4096,"enablePush":true,"initialWindowSize":65535,"maxFrameSize":16384,"maxConcurrentStreams":4294967295,"maxHeaderListSize":4294967295,"maxHeaderSize":4294967295,"enableConnectProtocol":false}
D 2022-11-29T22:20:27.014Z | call_stream | [0] Received server headers:
        :status: 464
        server: awselb/2.0
        date: Tue, 29 Nov 2022 22:20:27 GMT
        content-length: 0

D 2022-11-29T22:20:27.015Z | call_stream | [0] Received server trailers:
        :status: 464
        server: awselb/2.0
        date: Tue, 29 Nov 2022 22:20:27 GMT
        content-length: 0

D 2022-11-29T22:20:27.015Z | call_stream | [0] ended with status: code=2 details=""
D 2022-11-29T22:20:27.015Z | subchannel_refcount | (3) 54.253.104.5:443 callRefcount 1 -> 0
D 2022-11-29T22:20:27.015Z | call_stream | [0] close http2 stream with code 8

Is there a way to force HTTP/1.1? Our browser clients work fine at the moment, but this will break with Node.js.

wtfiwtz commented 2 years ago

Actually HTTP/1.1 won't work, as GRPC relies on HTTP/2

Here's the AWS setting for request vs. configured protocol https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#target-group-protocol-version

It should work, so this is some sort of issue with the Node.js implementation interacting with AWS Application Load Balancer (ALB). Perhaps you must be forced to configure as "HTTP2" on the ALB to support GRPC fully.

This would mean you can't mix HTTP/1.1 with HTTP/2 on that specific ALB, and would be forced to use TLS/HTTPS as well. https://caniuse.com/http2

Keyurkd commented 1 year ago

Hi @g-sartori , Any solution to this problem? I am facing the same issue with k8s & AWS ALB.

@cabulafhy, Can you please help with gRPC server-side code as well, if it's working for you?

divyesh565 commented 1 year ago

with AWS ALB I am getting 14 unavailable errors. But with IP and port, it is working as expected.

Also, I have a wildcard certificate on AWS ALB for my target group.