Text Streaming not working when using Nginx-Ingress on Kubernetes

Rainfarm commented 4 months ago

What happened:

We use Ngnix ingress controller in an EKS cluster, and the text streaming from the services running in the EKS cluster doesn't work, the client always get a response in one go. We've checked similar issues here (e.g., https://github.com/kubernetes/ingress-nginx/issues/10482, but the solutions suggested don't help).

Below are some details:

Ingress configurations (annotations):

annotations:
kubernetes.io/tls-acme: "true"
sidecar.istio.io/inject: "false"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "7200"
nginx.ingress.kubernetes.io/proxy-send-timeout: "7200"
nginx.ingress.kubernetes.io/proxy-read-timeout: "7200"
nginx.ingress.kubernetes.io/proxy-body-size: "0"
nginx.ingress.kubernetes.io/ssl-redirect: "false"
nginx.ingress.kubernetes.io/proxy-buffering: "off"
nginx.ingress.kubernetes.io/proxy-request-buffering: "off"

In the print out from curl command, we can see that the response has header of transfer-encoding: chunked:

< HTTP/1.1 200 OK
< date: Tue, 04 Jun 2024 16:07:40 GMT
< server: uvicorn
< content-type: application/octet-stream
< transfer-encoding: chunked

What you expected to happen: The client should receive response in stream.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

v1.9.6

Kubernetes version (use kubectl version): Client Version: v1.28.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.4-eks-036c24b Environment:

Cloud provider or hardware configuration: AWS EKS
OS (e.g. from /etc/os-release): Amazon Linux 2
Kernel (e.g. uname -a): 5.10.205-195.807.amzn2.x86_64
Install tools:
- Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
- EKS cluster created by using CloudFormation template

Basic cluster related info:

kubectl version

Client Version: v1.28.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4-eks-036c24b

kubectl get nodes -o wide

NAME                                          STATUS   ROLES    AGE    VERSION               INTERNAL-IP     EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-100-72-14-183.eu-west-1.compute.internal   Ready    <none>   120d   v1.29.0-eks-5e0fdde   100.72.14.183   <none>        Amazon Linux 2   5.10.205-195.807.amzn2.x86_64   containerd://1.7.11
ip-100-72-14-62.eu-west-1.compute.internal    Ready    <none>   3d2h   v1.29.0-eks-5e0fdde   100.72.14.62    <none>        Amazon Linux 2   5.10.192-183.736.amzn2.x86_64   containerd://1.7.11
ip-100-72-15-134.eu-west-1.compute.internal   Ready    <none>   120d   v1.29.0-eks-5e0fdde   100.72.15.134   <none>        Amazon Linux 2   5.10.205-195.807.amzn2.x86_64   containerd://1.7.11
ip-100-72-15-152.eu-west-1.compute.internal   Ready    <none>   4d5h   v1.29.0-eks-5e0fdde   100.72.15.152   <none>        Amazon Linux 2   5.10.192-183.736.amzn2.x86_64   containerd://1.7.11
ip-100-72-15-195.eu-west-1.compute.internal   Ready    <none>   40d    v1.29.0-eks-5e0fdde   100.72.15.195   <none>        Amazon Linux 2   5.10.205-195.807.amzn2.x86_64   containerd://1.7.11
ip-100-72-15-83.eu-west-1.compute.internal    Ready    <none>   26h    v1.29.0-eks-5e0fdde   100.72.15.83    <none>        Amazon Linux 2   5.10.192-183.736.amzn2.x86_64   containerd://1.7.11

How was the ingress-nginx-controller installed:
- If helm was used then please show output of helm ls -A | grep -i ingress
```
ingress-nginx                   ingress-nginx   1           2024-02-05 11:55:51.283694 +0100 CET    deployed    ingress-nginx-4.9.1                 1.9.6
```
- If helm was used then please show output of helm -n <ingresscontrollernamespace> get values <helmreleasename>
```
USER-SUPPLIED VALUES:
null
```
- If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used
```
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--version ingress-nginx-4.9.1 \
--namespace ingress-nginx --create-namespace
```
- if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances

Current State of the controller:

kubectl describe ingressclasses


Name:         alb
Labels:       app.kubernetes.io/instance=aws-load-balancer-controller
      app.kubernetes.io/managed-by=Helm
      app.kubernetes.io/name=aws-load-balancer-controller
      app.kubernetes.io/version=v2.7.2
      helm.sh/chart=aws-load-balancer-controller-1.7.2
Annotations:  meta.helm.sh/release-name: aws-load-balancer-controller
      meta.helm.sh/release-namespace: kube-system
Controller:   ingress.k8s.aws/alb
Events:       <none>

Name: nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.9.6 helm.sh/chart=ingress-nginx-4.9.1 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Controller: k8s.io/ingress-nginx Events:

  - `kubectl -n <ingresscontrollernamespace> get all -A -o wide`

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/ingress-nginx-controller-7fdbfcb8f9-l7t92 1/1 Running 0 120d 100.72.15.115 ip-100-72-15-134.eu-west-1.compute.internal

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/ingress-nginx-controller LoadBalancer 10.100.116.31 80:31909/TCP,443:30721/TCP 120d app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx service/ingress-nginx-controller-admission ClusterIP 10.100.25.196 443/TCP 120d app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR deployment.apps/ingress-nginx-controller 1/1 1 1 120d controller registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicaset.apps/ingress-nginx-controller-7fdbfcb8f9 1 1 1 120d controller registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=7fdbfcb8f9

  - `kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>`

Name: ingress-nginx-controller-7fdbfcb8f9-l7t92 Namespace: ingress-nginx Priority: 0 Service Account: ingress-nginx Node: ip-100-72-15-134.eu-west-1.compute.internal/100.72.15.134 Start Time: Mon, 05 Feb 2024 21:33:12 +0100 Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.9.6 helm.sh/chart=ingress-nginx-4.9.1 pod-template-hash=7fdbfcb8f9 Annotations: Status: Running IP: 100.72.15.115 IPs: IP: 100.72.15.115 Controlled By: ReplicaSet/ingress-nginx-controller-7fdbfcb8f9 Containers: controller: Container ID: containerd://18597a97709a1fb027c68a203a89075bb1922727795c63bdcbccb42031a9d133 Image: registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c Image ID: registry.k8s.io/ingress-nginx/controller@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c Ports: 80/TCP, 443/TCP, 8443/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP SeccompProfile: RuntimeDefault Args: /nginx-ingress-controller --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key State: Running Started: Mon, 05 Feb 2024 21:33:28 +0100 Ready: True Restart Count: 0 Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: ingress-nginx-controller-7fdbfcb8f9-l7t92 (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so Mounts: /usr/local/certificates/ from webhook-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfhx9 (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready True ContainersReady True PodScheduled True Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false kube-api-access-gfhx9: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events:

  - `kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>`

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller LoadBalancer 10.100.116.31 80:31909/TCP,443:30721/TCP 120d ingress-nginx-controller-admission ClusterIP 10.100.25.196 443/TCP 120d % kubectl -n ingress-nginx describe svc ingress-nginx-controller Name: ingress-nginx-controller Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.9.6 helm.sh/chart=ingress-nginx-4.9.1 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 10.100.116.31 IPs: 10.100.116.31 Port: http 80/TCP TargetPort: http/TCP NodePort: http 31909/TCP Endpoints: 100.72.15.115:80 Port: https 443/TCP TargetPort: https/TCP NodePort: https 30721/TCP Endpoints: 100.72.15.115:443 Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message

Normal EnsuringLoadBalancer 4m13s (x38 over 164m) service-controller Ensuring load balancer

- **Current state of ingress object, if applicable**:
  - `kubectl -n <appnamespace> get all,ing -o wide`
  - `kubectl -n <appnamespace> describe ing <ingressname>`

Name: appgateway-ingress Labels: app=appgateway app.kubernetes.io/managed-by=Helm Namespace: app Address: Ingress Class: nginx Default backend: Rules: Host Path Backends

/grafana grafana-ext-svc:80 () / appgateway-svc:8000 (100.72.15.125:8000) Annotations: kubernetes.io/tls-acme: true meta.helm.sh/release-name: app meta.helm.sh/release-namespace: app nginx.ingress.kubernetes.io/proxy-body-size: 0 nginx.ingress.kubernetes.io/proxy-buffering: off nginx.ingress.kubernetes.io/proxy-connect-timeout: 7200 nginx.ingress.kubernetes.io/proxy-read-timeout: 7200 nginx.ingress.kubernetes.io/proxy-request-buffering: off nginx.ingress.kubernetes.io/proxy-send-timeout: 7200 nginx.ingress.kubernetes.io/ssl-redirect: false sidecar.istio.io/inject: false Events: ``` - If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag Note: the ingress svc has been port-forwarded to local using: kubectl -n ingress-nginx port-forward svc/ingress-nginx-controller 8000:80 ```sh % curl -vvv -X POST http://localhost:8000/api/v1/llm/chat_stream \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "Host: " \ -d '{ "model": "mistral-7b", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Write a python program to requtest openai chat API!" } ], "temperature": 0.5, "stream": true, "max_new_tokens": 1024, "top_p": 0.9, "top_k": 40 }' Note: Unnecessary use of -X or --request, POST is already inferred. * Host localhost:8000 was resolved. * IPv6: ::1 * IPv4: 127.0.0.1 * Trying [::1]:8000... * Connected to localhost (::1) port 8000 > POST /api/v1/llm/chat_stream HTTP/1.1 > Host: > User-Agent: curl/8.6.0 > Accept: */* > Content-Type: application/json > Authorization: Bearer > Content-Length: 369 > < HTTP/1.1 200 OK < Date: Wed, 05 Jun 2024 09:54:16 GMT < Content-Type: text/plain; charset=utf-8 < Transfer-Encoding: chunked < Connection: keep-alive < To use OpenAI's Chat completions API with Python, you will first need to install the `openai` package using pip: pip install openai Next, create a new Python file and write the following code: import openai # Set up your OpenAI API key openai.api_key = "YOUR_API_KEY_HERE" def chat_completion(message): # Define the system message (the first message in the conversation) system_message = "You are a helpful assistant." # Create and send the request to OpenAI's API response = openai.Completion.create( engine="davinci", # Choose an engine (e.g., davinci, curie, babbage) prompt=f"{system_message}\n{message}", max_length=100, temperature=0.5, ) return response.choices[0].text # Test the function by sending a message to the API and printing the response print(chat_completion("Hello! How can I help you today?")) Replace `"YOUR_API_KEY_HERE"` with your actual OpenAI API key. To get an API key, sign up for a free account on OpenAI's website: https://openai.com/signup. The first 50,000 tokens are free each month. The `chat_completion` function takes a message as its argument and sends it to the OpenAI Chat Completions API. It returns the response from the API as a string. In this example, we define a system message that is always "You are a helpful assistant." but you can change it to any message you want for your application. The function uses the `Completion` class in the `openai` package to send the request and receive the response. The `max_length` parameter specifies the maximum length of the response (in tokens), and the `temperature` parameter controls how random the generated text will be. A lower temperature results in more deterministic responses, while a higher temperature makes the model generate more creative and varied responses. * Leftovers after chunking: 12 bytes * Connection #0 to host localhost left intact You can test the function by running the Python script and sending it a message as an argument. The response from the API will be printed to the console.% ``` - **Others**: - Any other related information like ; - copy/paste of the snippet (if applicable) - `kubectl describe ...` of any custom configmap(s) created and in use - Any other related information that may help **How to reproduce this issue**: **Anything else we need to know**: I can confirm that the application's streaming behaviour is correct: if we port-forward the application's service and invoke API against the forwarded port, the stream works fine.

k8s-ci-robot commented 4 months ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

longwuyuan commented 4 months ago

Your nodes are on AWS and your curl destination is hostname localhost so nothing can be valid about that curl.

But bigger problem is that the service created by the ingress-nginx controller is in a pending state. So there is no question at all about even sending or receiving a HTTP/HTTPS request.

If streaming is broken, it can even be reproduced on a kind cluster or a minikube cluster.

So please check the documentation about how to install run and use the ingress-nginx controller. Then try it on a kind cluster or a minikube cluster. Once you have it all figured out, then run the install with appropriate and preferably documented process. THen please edit this issue description and provide data that can be analyzed as a problem in the controller.

Thanks

/remove-kind bug /kind support /triage needs-information

Rainfarm commented 4 months ago

Your nodes are on AWS and your curl destination is hostname localhost so nothing can be valid about that curl.

But bigger problem is that the service created by the ingress-nginx controller is in a pending state. So there is no question at all about even sending or receiving a HTTP/HTTPS request.

If streaming is broken, it can even be reproduced on a kind cluster or a minikube cluster.

So please check the documentation about how to install run and use the ingress-nginx controller. Then try it on a kind cluster or a minikube cluster. Once you have it all figured out, then run the install with appropriate and preferably documented process. THen please edit this issue description and provide data that can be analyzed as a problem in the controller.

Thanks

/remove-kind bug /kind support /triage needs-information

Some clarificatons: As I've mentioned in the question:

Note: the ingress svc has been port-forwarded to local using: kubectl -n ingress-nginx port-forward svc/ingress-nginx-controller 8000:80.

That's why I can use localhost:8000 as the curl destination. It is valid.

The reason of testing using port-forwarding is that our service is exposed to Internet via CloudFlare Tunnel solution (and it's also the reason that the EXTERNAL-IP is in <pending> status). I tried to use port-forward to exclude the possible impacts by the [CloudFlare Tunnel]. The traffic path is:

Internet => [CloudFlare Tunnel] => [nginx ingress controller] => [application service] => [application pod]

The test scenarios are:

When we port-forwarding [applicatoin service], and curl against it, the streaming works
We expose our [application service] via AWS Network Loadbalancer, and curl against it, the streaming works
When we port-forwarding [nginx ingress controller], and curl against it, the streaming doesn't work (the client always receive the whole response in one go).
When curl directly against the URL exposed by [CloudFlare Tunnel], streaming doesn't work.

Thanks!

longwuyuan commented 4 months ago

I think you are providing info that helps but I don't know how to use it to reproduce the problem on a kind or a minikube cluster.

Would you consider this https://github.com/kubernetes/ingress-nginx/issues/11162#issuecomment-2019448596 as a valid text streaming test.

If you are port-forwarding, then is it across the internet or within a lan. All such details are needed for me to reproduce.

Critical info is a application docker image, of a small streaming server, that anyone can use on their own cluster to test.

You are not providing the log messages of the controller pod that is logged when streaming fails.

Maybe you should edit the issue description and ensure that there is enough info there that shows the small tiny details that are outputs of kubectl commands for describe and logs etc etc

Since you showed chatGPT, I will pick some random app from artifacthub.io to test, unless you can provide a minimalistic app

longwuyuan commented 4 months ago

unable to find a app to use in test

Rainfarm commented 4 months ago

I enabled debug level log in nginx ingress controller, made a test and grabbed the log. Since there're quite a lot logs generated at debug level, there might be some other activities logged.

Please check the log attached: test.log.tar.gz. Here're some highlights:

line 140 - 158 show the test request of POST /api/v1/llm/chat_stream HTTP/1.1
line 4700 seems to be the place where the response reached nginx
line 4717 - 4721 is the response
line 5473 seems to be the final response to the client Please be noted that the above is based on my very limited understanding and some guesses on the nginx ingress controller log, which can be wrong.

No error is logged during the test. What we can see from the client side is that the whole content of the response was received in one go, without streaming effect.

I'll try to come up with a text streaming test that is reproducible in local environment.

Thanks!

longwuyuan commented 4 months ago

Without knowing the app and the curl output of real use, its hard to understand why you think streaming is broken

allantatter commented 2 months ago

@Rainfarm How did you solve the problem?

kubernetes / ingress-nginx

Text Streaming not working when using Nginx-Ingress on Kubernetes #11430