Open rcng6514 opened 8 months ago
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/remove-kind bug
/triage needs-information /kind support
What type of GCP load balancer are you using? Can you try port forwarding to the ingress service and see if the issue persists?
What type of GCP load balancer are you using? Can you try port forwarding to the ingress service and see if the issue persists?
Client -> TCP L4 -> NGINX Ingress Controller -> App
With:
Client -> TCP L4 -> App
Connection holds open for 24 hrs
@rcng6514 the comment from @strongjz is seeking info on
client --> [port-forward-to-svc-created-by-controller] --> app
Morning, tested with TCP L4 LB removed and port forwarded to ingress-controller k8s svc. We still observe the same behaviour and connections are issued a GOAWAY after 5-10 minutes.
- Write real practical step by step instruction, including a example app image url, that readers can copy/paste from and reproduce on a minikube or kind cluster
@longwuyuan this isn't straightforward to achieve as the app contains IP that we'd need to strip from the image which will take considerable time. We'll start the process on this but will take time so was hoping to at least start the conversation on this.
@rcng6514 , thanks, Would you know if there is a chart on artifacthub.io or a image in hub.docker.com that can be used for reproduce ?
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev
on Kubernetes Slack.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.9.5/deploy/static/provider/cloud/deploy.yaml
---
apiVersion: v1
kind: Namespace
metadata:
name: grpc
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grpc
namespace: grpc
labels:
app: grpc
spec:
replicas: 1
selector:
matchLabels:
app: grpc
template:
metadata:
labels:
app: grpc
spec:
containers:
- name: grpc
image: docker.io/rcng1514/server
ports:
- containerPort: 8443
---
apiVersion: v1
kind: Service
metadata:
name: grpc
namespace: grpc
spec:
selector:
app: grpc
ports:
- protocol: TCP
port: 8443
targetPort: 8443
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grpc
namespace: grpc
annotations:
# use the shared ingress-nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
spec:
ingressClassName: nginx
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grpc
port:
number: 8443
tls:
- hosts:
- example.com
secretName: grpc
---
apiVersion: v1
kind: Secret
metadata:
name: grpc
namespace: grpc
type: kubernetes.io/tls
data:
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURzVENDQXBtZ0F3SUJBZ0lVZFg2RlVLem5vUTZXT0dndmNCOW9jVm1xSHVvd0RRWUpLb1pJaHZjTkFRRUwKQlFBd2ZERUxNQWtHQTFVRUJoTUNWVk14RVRBUEJnTlZCQWdNQ0U1bGR5QlpiM0pyTVJFd0R3WURWUVFIREFoTwpaWGNnV1c5eWF6RVlNQllHQTFVRUNnd1BSWGhoYlhCc1pTQkRiMjF3WVc1NU1Rc3dDUVlEVlFRTERBSkpWREVnCk1CNEdDU3FHU0liM0RRRUpBUllSWVdSdGFXNUFaWGhoYlhCc1pTNWpiMjB3SGhjTk1qUXdOREF6TVRjeE5UVXkKV2hjTk1qVXdOREF6TVRjeE5UVXlXakI4TVFzd0NRWURWUVFHRXdKVlV6RVJNQThHQTFVRUNBd0lUbVYzSUZsdgpjbXN4RVRBUEJnTlZCQWNNQ0U1bGR5QlpiM0pyTVJnd0ZnWURWUVFLREE5RmVHRnRjR3hsSUVOdmJYQmhibmt4CkN6QUpCZ05WQkFzTUFrbFVNU0F3SGdZSktvWklodmNOQVFrQkZoRmhaRzFwYmtCbGVHRnRjR3hsTG1OdmJUQ0MKQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFMbWhELzFqYzlTTG1oVjhtdjE2RDN6aApaTmxYdzlZdndIOWJIaitpV3Y5NDFsbTZqK0NVM3dNT3FpSloyZjJ6ZGtobk5uK3RmUTJhSXFNRlgrdm9zdkhZClNRV2lPMWRTc2EzQTJSZGJQd0V5QzV3bHh1ZUVtZE5vWWZtdHlzSkZSWk9nSkpYU01nelhOMGV3R2FJc1FEazgKZ29vZzV2cFN1OFdJbVNUMDJsUlZRV3FtZklzMSs1WFRhVjB0TXlmWFRFZVoxQXJ0cFZIdk5iekhLS3R4ZFZNQQp5dGx6K3U5Y1JZVzNGeTVoQS91VjFUTXdERzYrWWFVR0FJRzJidTk4TVg3Sk9iVWFtRUJDZnZTM2VWOWQzcXlpCnVualRlcnVlTi9KRmxhc2dpeVc2WmZyOWtobXBsUFl1d0Q5NkUyT2NHK2lzL0FUU1FEV2xJL2ZNYnFsWWJlY0MKQXdFQUFhTXJNQ2t3SndZRFZSMFJCQ0F3SG9JTFpYaGhiWEJzWlM1amIyMkNEM2QzZHk1bGVHRnRjR3hsTG1OdgpiVEFOQmdrcWhraUc5dzBCQVFzRkFBT0NBUUVBaVAwMTI0RFhvWnZnNmxsYW9GSEErWHdwZW1lbGFhaHpyTlVTCk1EbU10MWJ1U2ZKNkNmNkVTTlV1K1pISEI0OFFWd0pKWGxLaTJqVS9acHVvWDlqK0h6TmhnWHhNbEFJc2gyeUMKaTJubUFDOHcrU0hWOTgrRFJESW9YVHNDamRxSWRnSCtWelhzZjFWSkRmeUlhc1JsZGtGNmJDVUdsM0RjUGpkKwpId0VIN0NCZUo5d2lkQmxPRUdveWFDQW12WTJtd1huK216TnRSUXpCYTlWSEo2S1dvWmhCYjN3SXFnSFRTZ21FCnVVRHM2Sm4rcmVlU2FGajVYQTVuMVBKUjgvMXFKMEordk5rZ3IwZ3ZKa1gxT295OVY4YzJDNStLTkU3T3Q2NGQKekJJTTZrY29oMGZXQmIzdWZ1cUwrMU1qNU5HWXVPRFdxdVhSek15TkQwaE9HQWpwRVE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQzVvUS85WTNQVWk1b1YKZkpyOWVnOTg0V1RaVjhQV0w4Qi9XeDQvb2xyL2VOWlp1by9nbE44RERxb2lXZG45czNaSVp6Wi9yWDBObWlLagpCVi9yNkxMeDJFa0ZvanRYVXJHdHdOa1hXejhCTWd1Y0pjYm5oSm5UYUdINXJjckNSVVdUb0NTVjBqSU0xemRICnNCbWlMRUE1UElLS0lPYjZVcnZGaUprazlOcFVWVUZxcG55TE5mdVYwMmxkTFRNbjEweEhtZFFLN2FWUjd6VzgKeHlpcmNYVlRBTXJaYy9ydlhFV0Z0eGN1WVFQN2xkVXpNQXh1dm1HbEJnQ0J0bTd2ZkRGK3lUbTFHcGhBUW43MAp0M2xmWGQ2c29ycDQwM3E3bmpmeVJaV3JJSXNsdW1YNi9aSVpxWlQyTHNBL2VoTmpuQnZvclB3RTBrQTFwU1AzCnpHNnBXRzNuQWdNQkFBRUNnZ0VBTWkweU9Fa1F2MHc1QzBQU1ZXQVFIYTZEWnlpTkhERnVORDY2RDNOZ2E1d0wKUE5mc0drWERmbjBSU2hYRmtnbFhtTHlsZzUrdXBPV2NKVHJIc2VvRnJNL005VVBrREhlaTVaZXlWdGpvVC9kcQpJZndvSnQ2MkFlbytTWkpMczNXc0YvcDZ5VEMzTExka0R2R3dEQ0V2L3dpM05JVXVTazNneWNWaHVCYWppWlhICnplSHZGM0dVRFlFcGNuMzVXcG9FV3hyUkVUSjFXUVN4NFVveVlZeUptSHBDUlNYSklna05jTHU1Y1dmKzY4c0YKME94K05JajJqQ3N3SjNScS9PaGlEMXRMcTdRT1pweDAxM1NLSUIrT0YrNjZTL3F4eDIzeTh2Vm5nRkZQWEVMNwo3YkJzcXA1VXZwVy9XK2RPWVhrNWp6QXl1Ty9uMGZNU3dqNi9CeC9KMlFLQmdRRHNIQU5NTmhkZyt5N1kwa285CkdmSW5MeWdXVFNKNjUzQVNGU3pNTm10eVYwQlh5RGNaK3pnQXpPOWw2eUpnNmJRQ1dQRWc4amZ4R2dFSnBxUncKS3JVTTdhTFREUUFWcHBmRDRaQklmY1hzdmJwc3EwUmptMW9mN0hVKzF1MzJHY3J1YVhMYWxpekhuMUg0UzYyZAowUXZjQVIyYUtEQW9DaWR4SUZuNnhQMTdGUUtCZ1FESlJHR0p3N1FmSG1zdUpvU2dobGJOaVBGdGtSNHQwYzV5CnNBYmRQNVd5TjJTc3RPVVpYNEtaRDZTUE50TXNwWTdaK2tkOHlvZUZzb3Y1d0VqRnpFbDkyV1puZXZvWVVWZHgKWStvVlpuWC9GMUNxZTAzR2NiT1QvQ2ZLU0QzNWFrdXcxN20wMnFDVDNtZnVTOFJWYkJKV1d2K1loelE5dnFJSQpYMlVqclJ5VUN3S0JnQUx3bGxuc2tuM3lvckt3YTV3M0pueTJhWmxkZklCclFVbjRXWVp4WndVVmNRZW14b2pjClIrWTZwd0J0M1ErMzJUWHVSWkpUY2I3ZXhBU0t2cUZtNXJveWUwU0ZkT3JRR0RPb0sxTzd2U3NsY1p6SXhTRTQKWGZibnlzM3RmeWtCU1RXT3VvOWVMMUNNKzBoTUtPMCtIUmV3Szk0dmdlbjl0bUFDTnh5WU4wL0JBb0dCQUxKRApESmoyYTB5OHBuV2p6QWhacy93cmRKcDAwK1FGVmZNaWtaSFl4WCtwckZPRGpQN2lKMHZtSFB4enRLcHdvSXZVCkx3a0tZT283NzlwdlFvVmVvU0VFTXIwb29PWjA5UndMUU1OZmt0Y3pFVkZPRU43WXloTWlYU08reEpWcVhrdnQKWmlBWEcrNmNLRFZaaWpXV21NOC9uZTY4b2JxbVkrRkNqTlFDZWJOdEFvR0FWdFM1SkY3VkhvSHB5RWJjSWY1UgpiR25Ud0RxTnFrMjJwRjR3YkdrYlVXQXFVTWNjc011WFcxZ3pKOFUvSm1lbFJSKzBJb0x5TWdaOXBBaFdLd3hjCmQySXdJSXhXTDI4RlNNREJwZ0VWQmNyZk1vMnBrZHAwZEpzaTBEbm11Q25ocE9LVktycVptcE1IWjRmVVRjRHgKUHpEajB0K0hvRnI4VVR0VEVyZEpSTmc9Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K
docker run --network=host -e ENDPOINT=example.com --rm -it rcng1514/client:java
I'm observing GOAWAY after ~5-10 minutes, shorter periods dependent on NGINX load
...
Hello Client recieved at time 2024-04-05 08:10:04+00:00
Hello Client recieved at time 2024-04-05 08:10:09+00:00
Exception in thread "main" io.grpc.StatusRuntimeException: UNAVAILABLE: Connection closed after GOAWAY. HTTP/2 error code: NO_ERROR
at io.grpc.Status.asRuntimeException(Status.java:535)
at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:648)
at io.grpc.examples.helloworld.TestClient.main(TestClient.java:21)
@rcng6514 thanks
grpc-blah-timouts
using configuration-snippet https://nginx.org/en/docs/http/ngx_http_grpc_module.htmlThis is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev
on Kubernetes Slack.
Apologies for the delay, we've since upgraded the controller so we're now running v1.10.1. Still seem to be hitting this even with timeouts set high:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/configuration-snippet: |
grpc_read_timeout 3600s;
grpc_send_timeout 3600s;
nginx.ingress.kubernetes.io/limit-connections: "1000"
nginx.ingress.kubernetes.io/service-upstream: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/upstream-vhost: hellogrpc.example.svc.cluster.local
labels:
app.kubernetes.io/name: hellogrpc
application: example
name: hellogrpc
namespace: example
spec:
ingressClassName: nginx
rules:
- host: example.com
http:
paths:
- backend:
service:
name: hellogrpc
port:
number: 8443
path: /
pathType: Prefix
tls:
- hosts:
- example.com
secretName: example-tls
This only seems to be on some of our higher utilised GKE clusters. Client receives response consistently for ~5 minutes followed by:
{
"message": "Hello ",
"dateTime": "2024-05-23 12:41:12+00:00"
}
ERROR:
Code: Unavailable
Message: closing transport due to: connection error: desc = "error reading from server: EOF", received prior goaway: code: NO_ERROR```
If you provide ;
then anyone can try to reproduce on a minikube or a kind cluster.
In the ingress yaml posted above, the use of annotations like limit-connections and upstream-vhost, just throw UN-necessary complications for testing long-living gRPC stream, I would not use that.
What happened:
We have an application with GRPC streams working on GKE using an Ingress Cluster. We have a use case where we want to open a long lived grpc stream between my GRPC server(GKE) and Client should send data every second for infinite period of time. To achieve this use case I have designed my code in a way, that I never call OnCompleted method from GRPC Server java implementation, so that my stream remains open for infinite period of time. When I call my grpc method from Client, data transfer starts successfully for some time and streams run fine. However after few minutes(infrequent intervals) connection is automatically terminated giving below error message:- UNAVAILABLE: Connection closed after GOAWAY. HTTP/2 error code: NO_ERROR Time period for this error is not fixed, however this occurs after around 5 minutes of success data transfer between the client and the server(GKE). We have tried various properties and timeouts to increase the longevity for streams (attached below is annotations attempted) however, we haven't found anything concrete on it.
Below is the ingress configuration annotations we are using
What you expected to happen:
Either annotations are respected or there is a misunderstanding in how we can make the above requirement possible
Not sure, we've exhausted all avenues
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
Kubernetes version (use
kubectl version
): v1.27.9-gke.1092000Environment:
Cloud provider or hardware configuration: GKE
OS (e.g. from /etc/os-release): cos_containerd
Kernel (e.g.
uname -a
):How was the ingress-nginx-controller installed: $ helm template --values values.yaml --namespace example --version 4.9.0 ingress-nginx ingress-nginx/ingress-nginx > manifests.yaml
Then reference manifests.yaml as a resource in Kustomization with env specific patches for naming/annotations + lables
Current State of the controller:
kubectl describe ingressclasses
Current state of ingress object, if applicable:
Others:
kubectl describe ...
of any custom configmap(s) created and in useHow to reproduce this issue:
Vanilla ingress-nginx install in k8s. Simple GRPC app with long lived connection
This happens across multiple applications on the cluster and across multiple envs so its a specific instance of this. We've read and tried the following: https://kubernetes.github.io/ingress-nginx/examples/grpc/#notes-on-using-responserequest-streams However GOAWAYs continue to occur. Removing ingress-nginx and routing via NodePort, application held open the connection for 24+ hrs
--->
Anything else we need to know: