kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.4k stars 8.24k forks source link

since controller 1.10.0 (chart 4.10.0): ingress rejects duplicate "Transfer-Encoding: chunked" and returns 502 #11162

Closed DRoppelt closed 4 months ago

DRoppelt commented 6 months ago

TLDR: controller 1.10.0 updated to a more recent nginx that started to reject certain behaviour from downstream systems, e.g. transfering 2x Transfere-Encoding: Chunked headers.

as per this:

https://forum.nginx.org/read.php?2,297086,297090#msg-297090

... as the upstream response in question is obviously invalid and should never be accepted in the first place, and the change is more or less a minor cleanup work.

There is no setting in ingress-nginx that suppresses this behaviour

possible next steps:

1) fix the offending downstream systems so that they do not lead to nginx rejecting thje response

2) raise an issue with the nginx project

3) possibly in conjunction with 2), add an option that may be able to suppress the behaviour when there is feedback from the nginx project

The issue has been closed as I, the submitter, went for option 1), as the stance from the nginx project has been rather clear.


What happened: when controller from 1.9.6 (chart 4.9.1) to 1.10.0 (chart 4.10.0), we observe ingress answering 502 on behalve of the service. When rolling back to 1.9.6, restored to healthty behaviour

What you expected to happen: ingress behaving similar to 1.9.6 and not answering 502 where it did not before

there appears to be a regression of some sort. We have updated from 1.9.6 to 1.10.0 and observed that some requests return 502 right when we upgraded. We then downgraded and saw the 502s drop to previous numbers.

One instance where we observe it is when setting Transfer-Encoding: chunked twice, once in code and once via Spring Boot.

We also observe following error message in the logs

ingress-nginx-controller-8bf5b5f98-w8gbz 2024/03/22 11:01:54 [error] 575#575: *28680 upstream sent duplicate header line: "Transfer-Encoding: chunked", previous value: "Transfer-Encoding: chunked" while reading response header from upstream, client: 127.0.0.1, server: localhost, request: "POST /retrieve HTTP/1.1", upstream: "http://10.26.195.10:8080/retrieve", host: "localhost:8076"
ingress-nginx-controller-8bf5b5f98-w8gbz 127.0.0.1 - - [22/Mar/2024:11:01:54 +0000] "POST /retrieve HTTP/1.1" 502 150 "-" "PostmanRuntime/7.36.3" 3208 2.317 [default-ebilling-retrieval-http] ] 10.26.195.10:8080 0 2.318 502 94bc05d81342c91791fac0f02cb64434

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): 1.25.3

/etc/nginx $ /nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.10.0
  Build:         71f78d49f0a496c31d4c19f095469f3f23900f8a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.25.3

-------------------------------------------------------------------------------
/etc/nginx $

Kubernetes version (use kubectl version): v1.28.5

Environment:

- **Kernel** (e.g. `uname -a`):

/etc/nginx $ uname -a Linux ingress-nginx-controller-8bf5b5f98-w8gbz 5.15.0-1057-azure #65-Ubuntu SMP Fri Feb 9 18:39:24 UTC 2024 x86_64 Linux /etc/nginx $

- **Install tools**: AKS via terraform, ingress via helm (but also via terraform)
  - `Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc. `
- **Basic cluster related info**:
  - `kubectl version`

$ kubectl version WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version. Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:53:42Z", GoVers ion:"go1.20.5", Compiler:"gc", Platform:"windows/amd64"} Kustomize Version: v5.0.1 Server Version: version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.5", GitCommit:"506050d61cf291218dfbd41ac93913945c9aa0da", GitTreeState:"clean", BuildDate:"2023-12-23T00:10:25Z", GoVers ion:"go1.20.12", Compiler:"gc", Platform:"linux/amd64"}

  - `kubectl get nodes -o wide`

$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME aks-default-15207550-vmss000000 Ready agent 7d2h v1.28.5 10.26.194.4 Ubuntu 22.04.4 LTS 5.15.0-1057-azure containerd://1.7.7-1 aks-pool001-49772321-vmss000000 Ready agent 7d1h v1.28.5 10.26.195.9 Ubuntu 22.04.4 LTS 5.15.0-1057-azure containerd://1.7.7-1 aks-pool001-49772321-vmss000001 Ready agent 7d1h v1.28.5 10.26.194.207 Ubuntu 22.04.4 LTS 5.15.0-1057-azure containerd://1.7.7-1 aks-pool001-49772321-vmss00000b Ready agent 7d1h v1.28.5 10.26.194.33 Ubuntu 22.04.4 LTS 5.15.0-1057-azure containerd://1.7.7-1 aks-pool002-37360131-vmss00000h Ready agent 7d v1.28.5 10.26.194.91 Ubuntu 22.04.4 LTS 5.15.0-1057-azure containerd://1.7.7-1 aks-pool002-37360131-vmss00000q Ready agent 7d v1.28.5 10.26.194.120 Ubuntu 22.04.4 LTS 5.15.0-1057-azure containerd://1.7.7-1 aks-pool002-37360131-vmss00000v Ready agent 7d v1.28.5 10.26.194.149 Ubuntu 22.04.4 LTS 5.15.0-1057-azure containerd://1.7.7-1


- **How was the ingress-nginx-controller installed**:
  - If helm was used then please show output of `helm ls -A | grep -i ingress`

$ helm ls -A | grep -i ingress ingress-nginx ap-system 1 2024-03-21 08:44:29.913568481 +0000 UTC deployed ingress-nginx-4.10.0


  - If helm was used then please show output of `helm -n <ingresscontrollernamespace> get values <helmreleasename>`

$ helm -n ap-system get values ingress-nginx USER-SUPPLIED VALUES: controller: ingressClassResource: name: nginx service: type: ClusterIP

  - If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used
  - if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances

- **Current State of the controller**:
  - `kubectl describe ingressclasses` 

$ kubectl describe ingressclasses Name: nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.10.0 helm.sh/chart=ingress-nginx-4.10.0 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ap-system Controller: k8s.io/ingress-nginx Events:

  - `kubectl -n <ingresscontrollernamespace> get all -A -o wide`
  - `kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>`
  - `kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>`

- **Current state of ingress object, if applicable**:
  - `kubectl -n <appnamespace> get all,ing -o wide`
  - `kubectl -n <appnamespace> describe ing <ingressname>`
  - If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag

- **Others**:
  - Any other related information like ;
    - copy/paste of the snippet (if applicable)
    - `kubectl describe ...` of any custom configmap(s) created and in use
    - Any other related information that may help

**How to reproduce this issue**:
we have a service that somestimes anwers with 2x `Transfer-Encoding: chunked` , sometimes with 1x `Transfer-Encoding: chunked`, when it answers with 2x the header, the request does not reach the client and ingress answers with 502

we hook into the ingress via port-forward to reproduce it, but have no better setup for local reproducing

1)

kubectl port-forward -n ap-system ingress-nginx-controller-8bf5b5f98-w8gbz 8076 8080


2) `<send requests via postman>`

3) observe 502 & log when pod answers with duplicate header `Transfer-Encoding: chunked`, regular 200 when answering with one header 

We have also observed this for other clients with other backends, but so far have only particular endpoint reproduced with "always errors when X and always ok when Y"

<!---

As minimally and precisely as possible. Keep in mind we do not have access to your cluster or application.
Help up us (if possible) reproducing the issue using minikube or kind.

## Install minikube/kind

- Minikube https://minikube.sigs.k8s.io/docs/start/
- Kind https://kind.sigs.k8s.io/docs/user/quick-start/

## Install the ingress controller

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/baremetal/deploy.yaml

## Install an application that will act as default backend (is just an echo app)

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/docs/examples/http-svc.yaml

## Create an ingress (please add any additional annotation required)

echo "
  apiVersion: networking.k8s.io/v1
  kind: Ingress
  metadata:
    name: foo-bar
    annotations:
      kubernetes.io/ingress.class: nginx
  spec:
    ingressClassName: nginx # omit this if you're on controller version below 1.0.0
    rules:
    - host: foo.bar
      http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: http-svc
              port: 
                number: 80
" | kubectl apply -f -

## make a request

POD_NAME=$(k get pods -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx -o NAME)
kubectl exec -it -n ingress-nginx $POD_NAME -- curl -H 'Host: foo.bar' localhost

--->

**Anything else we need to know**:

we now have deployed a seperate ingress @ 1.10.0 (with seperate ingress class) and can observe this behaviour while our "hot" ingress that gets traffik is back to 1.9.6, the 1.10.0 still breaks while we have an operational ingress with 1.9.6. It does sound somewhat similar to https://github.com/spring-projects/spring-boot/issues/37646

<!-- If this is actually about documentation, uncomment the following block -->

<!-- 
/kind documentation
/remove-kind bug
-->
github-actions[bot] commented 5 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

SergeyLadutko commented 5 months ago

me too

DRoppelt commented 4 months ago

update from our org: we have identified all our affected services in testing-stages (by running the updated controller, observing test issues and looking into ingress logs), fixed them and then rolled out ingress update afterwards. So we fixed the downstream systems with odd header-proxying.

The stance here was clear enough to acknowledge that we just need to fix the services. We were lucky enough to not have vendor-provided software misbehaving but deployables that we can patch ourselves

https://forum.nginx.org/read.php?2,297086,297090#msg-297090

... as the upstream response in question is obviously invalid and should never be accepted in the first place, and the change is more or less a minor cleanup work.

longwuyuan commented 4 months ago

@DRoppelt thanks for the update. Seems in sync with the tests.

Glad its not an issue anymore. I request that you close this issue, if there are no more questions.

zeenmc commented 3 months ago

Hello Team,

is there any updates on this issue ?

longwuyuan commented 3 months ago

The related info is ALREADY present above so there is no update pending from the project

DRoppelt commented 3 months ago

TLDR: controller 1.10.0 updated to a more recent nginx that started to reject certain behaviour from downstream systems, e.g. transfering 2x Transfere-Encoding: Chunked headers.

as per this:

https://forum.nginx.org/read.php?2,297086,297090#msg-297090

... as the upstream response in question is obviously invalid and should never be accepted in the first place, and the change is more or less a minor cleanup work.

There is no setting in ingress-nginx that suppresses this behaviour

possible next steps:

1) fix the offending downstream systems so that they do not lead to nginx rejecting thje response

2) raise an issue with the nginx project

3) possibly in conjunction with 2), add an option that may be able to suppress the behaviour when there is feedback from the nginx project

The issue has been closed as I, the submitter, went for option 1), as the stance from the nginx project has been rather clear.