kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.54k stars 8.26k forks source link

GRPC client receiving PROTOCOL_ERROR continuously until connection is reopened by client #5600

Closed arjunsingri closed 4 years ago

arjunsingri commented 4 years ago

NGINX Ingress controller version:

1.36.2

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:07:57Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Environment:

What happened:

I have a GPRC server in a local k8s cluster behind an nginx ingress controller. The GRPC client is able to connect to it with InsecureSkipVerify=true using grpc-go. Initially the connection works and the server is able to stream the data to the client. The client is able to consume all this data by calling Recv. But once there is no more data to stream from the server, the server takes a pause. But the client is getting the below error when the client calls Recv on the stream after the server takes a pause. According to grpc spec, the client's Recv call should block:

code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR

The client is repeatedly getting this until the client closes the connection with the server and opens a new connection. Then the process repeats in an endless loop.

Here is some debug info:

2020/05/26 06:55:47 http2: Framer 0xc000412380: wrote PING len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:55:47 http2: Framer 0xc000412380: read PING flags=ACK len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:55:57 http2: Framer 0xc000412380: wrote PING len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:55:57 http2: Framer 0xc000412380: read PING flags=ACK len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:56:07 http2: Framer 0xc000412380: wrote PING len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:56:07 http2: Framer 0xc000412380: read PING flags=ACK len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:56:17 http2: Framer 0xc000412380: wrote PING len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:56:17 http2: Framer 0xc000412380: read PING flags=ACK len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:56:27 http2: Framer 0xc000412380: wrote PING len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:56:27 http2: Framer 0xc000412380: read PING flags=ACK len=8 ping="\x00\x00\x00\x00\x00\x00\x00\x00"
2020/05/26 06:56:35 http2: Framer 0xc000412380: read RST_STREAM stream=1 len=4 ErrCode=PROTOCOL_ERROR

What you expected to happen:

Not get PROTOCOL_ERROR. The client and server are able to connect and work correctly without the ingress controller.

How to reproduce it:

I have a custom app and unable to share its source code.

aledbf commented 4 years ago

@arjunsingri please post the ingress controller pod log

arjunsingri commented 4 years ago

@aledbf

Here is the only relevant log. I have redacted some of the fields:

<ip_address2> - - [27/May/2020:03:17:36 +0000] "POST /API/APIName HTTP/2.0" 200 108 "-" "grpc-go/1.27.1" 110 60.002 [prf--cdp-asdf-80] [] <ip_address>:12345 117 60.001 200 a2b3fe9fda88e6bee1d1a46c63a4afd9
arjunsingri commented 4 years ago

The RST_STREAM frame is being sent to the client every 60 seconds.

aledbf commented 4 years ago

60.001

I think you are facing read timeouts.

Please add the next annotation to the ingress definiion to test if this is the issue:

nginx.ingress.kubernetes.io/configuration-snippet: |
  grpc_read_timeout 3600s;

http://nginx.org/en/docs/http/ngx_http_grpc_module.html#grpc_read_timeout

arjunsingri commented 4 years ago

I get this error when I set that:

-------------------------------------------------------------------------------
W0527 04:31:23.301252       6 queue.go:130] requeuing prf--cdp/asdf-ingress, err
-------------------------------------------------------------------------------
Error: exit status 1
2020/05/27 04:31:23 [emerg] 6421#6421: unknown directive "grpc_read_timeout:" in /tmp/nginx-cfg168439187:1072
nginx: [emerg] unknown directive "grpc_read_timeout:" in /tmp/nginx-cfg168439187:1072
nginx: configuration file /tmp/nginx-cfg168439187 test failed

-------------------------------------------------------------------------------
arjunsingri commented 4 years ago

This is what I have inserted into my Ingress object:

    nginx.ingress.kubernetes.io/configuration-snippet: |
      grpc_read_timeout 3600s;
aledbf commented 4 years ago

@arjunsingri please check the version you are using running kubectl exec -n ingress-nginx <ingress controller pod> -- nginx -v

(I cannot reproduce the issue you see)

arjunsingri commented 4 years ago

Here you go

$ kubectl exec -it nginx-ingress-controller-4wmd4 /bin/bash
bash-5.0$ nginx -v
nginx version: nginx/1.17.8
arjunsingri commented 4 years ago

I get this error when I set that:

-------------------------------------------------------------------------------
W0527 04:31:23.301252       6 queue.go:130] requeuing prf--cdp/asdf-ingress, err
-------------------------------------------------------------------------------
Error: exit status 1
2020/05/27 04:31:23 [emerg] 6421#6421: unknown directive "grpc_read_timeout:" in /tmp/nginx-cfg168439187:1072
nginx: [emerg] unknown directive "grpc_read_timeout:" in /tmp/nginx-cfg168439187:1072
nginx: configuration file /tmp/nginx-cfg168439187 test failed

-------------------------------------------------------------------------------

This error is no longer happening. Maybe it was an error when I inserted a colon after grpc_read_timeout.

arjunsingri commented 4 years ago

But I am still getting protocol error after 60 seconds. If this was due to grpc_read_timeout, should it result in a protocol error? PROTOCOL_ERROR looks like a bug to me.

aledbf commented 4 years ago

@arjunsingri please post the ingress yaml definition you are using

arjunsingri commented 4 years ago
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    field.cattle.io/publicEndpoints: '[{"addresses":["ip1","ip2"],"port":80,"protocol":"HTTP","serviceName":"prf--cdp:asdf","ingressName":"prf--cdp:asdf-ingress","hostname":"devbox","path":"/","allNodes":false},{"addresses":["ip1","ip2"],"port":80,"protocol":"HTTP","serviceName":"prf--cdp:asdf","ingressName":"prf--cdp:asdf-ingress","hostname":"asdf2","path":"/","allNodes":false}]'
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
    nginx.ingress.kubernetes.io/configuration-snippet: |
      grpc_read_timeout 3600s;
    nginx.ingress.kubernetes.io/server-snippet: |
      if ($http_original_host = '') {
          set $http_original_host $http_host;
      }
    nginx.ingress.kubernetes.io/upstream-vhost: $http_original_host
  creationTimestamp: "2020-05-22T03:28:24Z"
  generation: 1
  name: asdf-ingress
  namespace: prf--cdp
  resourceVersion: "14818842"
  selfLink: /apis/extensions/v1beta1/namespaces/prf--cdp/ingresses/asdf-ingress
  uid: bb7044f1-7def-4062-a0f9-c8d028450fe7
spec:
  rules:
  - host: devbox
    http:
      paths:
      - backend:
          serviceName: asdf
          servicePort: 80
        path: /
  - host: asdf
    http:
      paths:
      - backend:
          serviceName: asdf
          servicePort: 80
        path: /
status:
  loadBalancer:
    ingress:
    - ip: ip1
    - ip: ip2
aledbf commented 4 years ago

@arjunsingri you need to use http2 for grpc. There is no support for that in port 80, only http. You need to use HTTPS, adding the tls section and secret in the ingress (http2 is enabled in port 443) This is an NGINX restriction, there is no http/2 support without SSL.

Note: there is a patch to be included in the next nginx version that will log a warning about this https://trac.nginx.org/nginx/ticket/1981#comment:2

arjunsingri commented 4 years ago

We are using port 443 but we are specifying InsecureSkipVerify=true in golang with an empty TLS credentials object:

// InsecureSkipVerify controls whether a client verifies the // server's certificate chain and host name. // If InsecureSkipVerify is true, TLS accepts any certificate // presented by the server and any host name in that certificate. // In this mode, TLS is susceptible to man-in-the-middle attacks. // This should be used only for testing.

Also, I don't see a http2 annotation in here https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/

aledbf commented 4 years ago

Also, I don't see a http2 annotation in here https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/

HTTP/2 is enabled by default when you have a tls section in the ingress definition

https://github.com/kubernetes/ingress-nginx/tree/master/docs/examples/tls-termination

arjunsingri commented 4 years ago

I see, thank you for providing that information. I will try it out.

Just to confirm, would this solve the protocol error?

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/5600#issuecomment-715970466): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
Gabriel-rd2 commented 2 years ago

Just got the exact same error while trying to use kubemq-js to subscribe an events client to a kubemq service behind a nginx ingress.