Text Streaming not working when using Nginx-Ingress on Kubernetes #11430

Closed Rainfarm closed 4 months ago

Rainfarm commented 4 months ago

What happened:

We use Ngnix ingress controller in an EKS cluster, and the text streaming from the services running in the EKS cluster doesn't work, the client always get a response in one go. We've checked similar issues here (e.g., https://github.com/kubernetes/ingress-nginx/issues/10482, but the solutions suggested don't help).

Below are some details:

What you expected to happen: The client should receive response in stream.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):


Kubernetes version (use kubectl version): Client Version: v1.28.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.4-eks-036c24b Environment:

Name: nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.9.6 helm.sh/chart=ingress-nginx-4.9.1 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Controller: k8s.io/ingress-nginx Events:

  - `kubectl -n <ingresscontrollernamespace> get all -A -o wide`

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/ingress-nginx-controller-7fdbfcb8f9-l7t92 1/1 Running 0 120d ip-100-72-15-134.eu-west-1.compute.internal

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/ingress-nginx-controller LoadBalancer 80:31909/TCP,443:30721/TCP 120d app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx service/ingress-nginx-controller-admission ClusterIP 443/TCP 120d app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR deployment.apps/ingress-nginx-controller 1/1 1 1 120d controller registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicaset.apps/ingress-nginx-controller-7fdbfcb8f9 1 1 1 120d controller registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=7fdbfcb8f9

  - `kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>`

Name: ingress-nginx-controller-7fdbfcb8f9-l7t92 Namespace: ingress-nginx Priority: 0 Service Account: ingress-nginx Node: ip-100-72-15-134.eu-west-1.compute.internal/ Start Time: Mon, 05 Feb 2024 21:33:12 +0100 Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.9.6 helm.sh/chart=ingress-nginx-4.9.1 pod-template-hash=7fdbfcb8f9 Annotations: Status: Running IP: IPs: IP: Controlled By: ReplicaSet/ingress-nginx-controller-7fdbfcb8f9 Containers: controller: Container ID: containerd://18597a97709a1fb027c68a203a89075bb1922727795c63bdcbccb42031a9d133 Image: registry.k8s.io/ingress-nginx/controller:v1.9.6@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c Image ID: registry.k8s.io/ingress-nginx/controller@sha256:1405cc613bd95b2c6edd8b2a152510ae91c7e62aea4698500d23b2145960ab9c Ports: 80/TCP, 443/TCP, 8443/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP SeccompProfile: RuntimeDefault Args: /nginx-ingress-controller --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key State: Running Started: Mon, 05 Feb 2024 21:33:28 +0100 Ready: True Restart Count: 0 Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: ingress-nginx-controller-7fdbfcb8f9-l7t92 (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so Mounts: /usr/local/certificates/ from webhook-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfhx9 (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready True ContainersReady True PodScheduled True Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false kube-api-access-gfhx9: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events:

  - `kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>`

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller LoadBalancer 80:31909/TCP,443:30721/TCP 120d ingress-nginx-controller-admission ClusterIP 443/TCP 120d % kubectl -n ingress-nginx describe svc ingress-nginx-controller Name: ingress-nginx-controller Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.9.6 helm.sh/chart=ingress-nginx-4.9.1 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: IPs: Port: http 80/TCP TargetPort: http/TCP NodePort: http 31909/TCP Endpoints: Port: https 443/TCP TargetPort: https/TCP NodePort: https 30721/TCP Endpoints: Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message

Normal EnsuringLoadBalancer 4m13s (x38 over 164m) service-controller Ensuring load balancer

- **Current state of ingress object, if applicable**:
  - `kubectl -n <appnamespace> get all,ing -o wide`
  - `kubectl -n <appnamespace> describe ing <ingressname>`

Name: appgateway-ingress Labels: app=appgateway app.kubernetes.io/managed-by=Helm Namespace: app Address: Ingress Class: nginx Default backend: Rules: Host Path Backends

/grafana grafana-ext-svc:80 () / appgateway-svc:8000 ( Annotations: kubernetes.io/tls-acme: true meta.helm.sh/release-name: app meta.helm.sh/release-namespace: app nginx.ingress.kubernetes.io/proxy-body-size: 0 nginx.ingress.kubernetes.io/proxy-buffering: off nginx.ingress.kubernetes.io/proxy-connect-timeout: 7200 nginx.ingress.kubernetes.io/proxy-read-timeout: 7200 nginx.ingress.kubernetes.io/proxy-request-buffering: off nginx.ingress.kubernetes.io/proxy-send-timeout: 7200 nginx.ingress.kubernetes.io/ssl-redirect: false sidecar.istio.io/inject: false Events: ``` - If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag Note: the ingress svc has been port-forwarded to local using: kubectl -n ingress-nginx port-forward svc/ingress-nginx-controller 8000:80 ```sh % curl -vvv -X POST http://localhost:8000/api/v1/llm/chat_stream \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -H "Host: " \ -d '{ "model": "mistral-7b", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Write a python program to requtest openai chat API!" } ], "temperature": 0.5, "stream": true, "max_new_tokens": 1024, "top_p": 0.9, "top_k": 40 }' Note: Unnecessary use of -X or --request, POST is already inferred. * Host localhost:8000 was resolved. * IPv6: ::1 * IPv4: * Trying [::1]:8000... * Connected to localhost (::1) port 8000 > POST /api/v1/llm/chat_stream HTTP/1.1 > Host: > User-Agent: curl/8.6.0 > Accept: */* > Content-Type: application/json > Authorization: Bearer > Content-Length: 369 > < HTTP/1.1 200 OK < Date: Wed, 05 Jun 2024 09:54:16 GMT < Content-Type: text/plain; charset=utf-8 < Transfer-Encoding: chunked < Connection: keep-alive < To use OpenAI's Chat completions API with Python, you will first need to install the `openai` package using pip: pip install openai Next, create a new Python file and write the following code: import openai # Set up your OpenAI API key openai.api_key = "YOUR_API_KEY_HERE" def chat_completion(message): # Define the system message (the first message in the conversation) system_message = "You are a helpful assistant." # Create and send the request to OpenAI's API response = openai.Completion.create( engine="davinci", # Choose an engine (e.g., davinci, curie, babbage) prompt=f"{system_message}\n{message}", max_length=100, temperature=0.5, ) return response.choices[0].text # Test the function by sending a message to the API and printing the response print(chat_completion("Hello! How can I help you today?")) Replace `"YOUR_API_KEY_HERE"` with your actual OpenAI API key. To get an API key, sign up for a free account on OpenAI's website: https://openai.com/signup. The first 50,000 tokens are free each month. The `chat_completion` function takes a message as its argument and sends it to the OpenAI Chat Completions API. It returns the response from the API as a string. In this example, we define a system message that is always "You are a helpful assistant." but you can change it to any message you want for your application. The function uses the `Completion` class in the `openai` package to send the request and receive the response. The `max_length` parameter specifies the maximum length of the response (in tokens), and the `temperature` parameter controls how random the generated text will be. A lower temperature results in more deterministic responses, while a higher temperature makes the model generate more creative and varied responses. * Leftovers after chunking: 12 bytes * Connection #0 to host localhost left intact You can test the function by running the Python script and sending it a message as an argument. The response from the API will be printed to the console.% ``` - **Others**: - Any other related information like ; - copy/paste of the snippet (if applicable) - `kubectl describe ...` of any custom configmap(s) created and in use - Any other related information that may help **How to reproduce this issue**: **Anything else we need to know**: I can confirm that the application's streaming behaviour is correct: if we port-forward the application's service and invoke API against the forwarded port, the stream works fine.
longwuyuan commented 4 months ago

Your nodes are on AWS and your curl destination is hostname localhost so nothing can be valid about that curl.

But bigger problem is that the service created by the ingress-nginx controller is in a pending state. So there is no question at all about even sending or receiving a HTTP/HTTPS request.

If streaming is broken, it can even be reproduced on a kind cluster or a minikube cluster.

So please check the documentation about how to install run and use the ingress-nginx controller. Then try it on a kind cluster or a minikube cluster. Once you have it all figured out, then run the install with appropriate and preferably documented process. THen please edit this issue description and provide data that can be analyzed as a problem in the controller.


/remove-kind bug /kind support /triage needs-information

Rainfarm commented 4 months ago

Some clarificatons: As I've mentioned in the question:

Note: the ingress svc has been port-forwarded to local using: kubectl -n ingress-nginx port-forward svc/ingress-nginx-controller 8000:80.

That's why I can use localhost:8000 as the curl destination. It is valid.

The reason of testing using port-forwarding is that our service is exposed to Internet via CloudFlare Tunnel solution (and it's also the reason that the EXTERNAL-IP is in <pending> status). I tried to use port-forward to exclude the possible impacts by the [CloudFlare Tunnel]. The traffic path is:

Internet => [CloudFlare Tunnel] => [nginx ingress controller] => [application service] => [application pod]

The test scenarios are:


longwuyuan commented 4 months ago

I think you are providing info that helps but I don't know how to use it to reproduce the problem on a kind or a minikube cluster.

Would you consider this https://github.com/kubernetes/ingress-nginx/issues/11162#issuecomment-2019448596 as a valid text streaming test.

If you are port-forwarding, then is it across the internet or within a lan. All such details are needed for me to reproduce.

Critical info is a application docker image, of a small streaming server, that anyone can use on their own cluster to test.

You are not providing the log messages of the controller pod that is logged when streaming fails.

Maybe you should edit the issue description and ensure that there is enough info there that shows the small tiny details that are outputs of kubectl commands for describe and logs etc etc

Since you showed chatGPT, I will pick some random app from artifacthub.io to test, unless you can provide a minimalistic app

longwuyuan commented 4 months ago

unable to find a app to use in test

Rainfarm commented 4 months ago

I enabled debug level log in nginx ingress controller, made a test and grabbed the log. Since there're quite a lot logs generated at debug level, there might be some other activities logged.

Please check the log attached: test.log.tar.gz. Here're some highlights:

No error is logged during the test. What we can see from the client side is that the whole content of the response was received in one go, without streaming effect.

I'll try to come up with a text streaming test that is reproducible in local environment.


longwuyuan commented 4 months ago

Without knowing the app and the curl output of real use, its hard to understand why you think streaming is broken

allantatter commented 2 months ago

@Rainfarm How did you solve the problem?