ssl-passthrough config lost for ingress on config reload when --disable-full-test is set

jfpucheu commented 1 year ago

What happened: Ingress using ssl-passthrough stop working on some nginx config reload and finaly the request is sent to nginx and the fake certificate is display because no certificates are set on this ingress because it is ssl-passthrough

for example:

I0908 08:26:40.475440       6 tcp.go:74] "TLS Client Hello" host="auth.ccnp-mgt01.com.toto"
I0908 08:26:40.475550       6 tcp.go:84] "passing to" hostport="10.105.235.186:5556"   --> this is ok
#### config reload ######
I0908 08:26:50.674770       6 tcp.go:74] "TLS Client Hello" host="auth.ccnp-mgt01.com.toto"
I0908 08:26:50.674898       6 tcp.go:84] "passing to" hostport="127.0.0.1:442" ->  ssl-passthrough stop working , redirect to nginx

a full restart of nginx solve the issue during some times..

What you expected to happen:

Nginx-ingress-controller should continue to redirect request tcp to 10.105.235.186:5556 all time

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.8.1
  Build:         dc88dce9ea5e700f3301d16f971fa17c6cfe757d
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.8", GitCommit:"0ce7342c984110dfc93657d64df5dc3b2c0d1fe9", GitTreeState:"clean", BuildDate:"2023-03-15T13:39:54Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.8", GitCommit:"0ce7342c984110dfc93657d64df5dc3b2c0d1fe9", GitTreeState:"clean", BuildDate:"2023-03-15T13:33:02Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release): centos7, Ubuntu 22
Kernel (e.g. uname -a): 5.15.0-60-generic #66-Ubuntu
Install tools:
- clusterAPI
- fluxcd

Basic cluster related info:

v1.25.8

ccnp-mgt01-ingress-6dd82         Ready    <none>          44d   v1.25.8   X.X.X.X    <none>        Ubuntu 22.04.2 LTS   5.15.0-60-generic   containerd://1.6.21
ccnp-mgt01-ingress-hrzsn         Ready    <none>          44d   v1.25.8   X.X.X.X  <none>        Ubuntu 22.04.2 LTS   5.15.0-60-generic   containerd://1.6.21`

How was the ingress-nginx-controller installed:
apply of https://github.com/kubernetes/ingress-nginx/blob/main/deploy/static/provider/cloud/deploy.yaml from flux

flags:

containers:
  - args:
    - /nginx-ingress-controller
    - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
    - --election-id=ingress-nginx-leader
    - --controller-class=k8s.io/ingress-nginx
    - --ingress-class=nginx
    - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
    - --validating-webhook=:8443
    - --validating-webhook-certificate=/usr/local/certificates/cert
    - --validating-webhook-key=/usr/local/certificates/key
    - --enable-ssl-passthrough
    - --watch-ingress-without-class=true
    - --disable-full-test
    - --v=4

instances:

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ingress-nginx-admission-create-v56th 0/1 Completed 0 7d6h 172.16.11.105 ccnp-mgt01-worker-fmvrf ingress-nginx-admission-patch-dbg54 0/1 Completed 0 7d6h 172.16.21.161 ccnp-mgt01-worker-x64nk ingress-nginx-controller-jlj7q 1/1 Running 0 7h20m 172.16.16.79 ccnp-mgt01-ingress-6dd82 ingress-nginx-controller-nbwkt 1/1 Running 0 7h20m 172.16.14.43 ccnp-mgt01-ingress-hrzsn

Current State of the controller:

`kubectl describe ingressclasses

Name:         nginx
Labels:       app.kubernetes.io/component=controller
      app.kubernetes.io/instance=ingress-nginx
      app.kubernetes.io/name=ingress-nginx
      app.kubernetes.io/part-of=ingress-nginx
      app.kubernetes.io/version=1.8.1
      kustomize.toolkit.fluxcd.io/name=ingress-nginx
      kustomize.toolkit.fluxcd.io/namespace=ingress-nginx
Annotations:  ingressclass.kubernetes.io/is-default-class: true
Controller:   k8s.io/ingress-nginx
Events:       <none>

Current state of ingress object, if applicable:

Name:             ingress-nginx-controller-jlj7q
Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             ccnp-mgt01-ingress-6dd82/10.235.85.6
Start Time:       Fri, 08 Sep 2023 08:41:59 +0000
Labels:           app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.8.1
              controller-revision-hash=75c54bf864
              pod-template-generation=8
Annotations:      <none>
Status:           Running
IP:               172.16.16.79
IPs:
IP:           172.16.16.79
Controlled By:  DaemonSet/ingress-nginx-controller
Containers:
controller:
Container ID:  containerd://a082b11dd8bd7dc4ce8f93b527bd2cf5bb7beb64407c31942b6530b994257888
Image:         proxy-docker.nexus.com.toto/ingress-nginx/controller:v1.8.1@sha256:e5c4824e7375fcf2a393e1c03c293b69759af37a9ca6abdb91b13d78a93da8bd
Image ID:      proxy-docker.nexus.com.toto/ingress-nginx/controller@sha256:e5c4824e7375fcf2a393e1c03c293b69759af37a9ca6abdb91b13d78a93da8bd
Ports:         80/TCP, 443/TCP, 8443/TCP, 10254/TCP
Host Ports:    80/TCP, 443/TCP, 0/TCP, 0/TCP
Args:
  /nginx-ingress-controller
  --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
  --election-id=ingress-nginx-leader
  --controller-class=k8s.io/ingress-nginx
  --ingress-class=nginx
  --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
  --validating-webhook=:8443
  --validating-webhook-certificate=/usr/local/certificates/cert
  --validating-webhook-key=/usr/local/certificates/key
  --enable-ssl-passthrough
  --watch-ingress-without-class=true
  --v=4
State:          Running
  Started:      Fri, 08 Sep 2023 08:41:59 +0000
Ready:          True
Restart Count:  0
Limits:
  memory:  20Gi
Requests:
  cpu:      100m
  memory:   90Mi
Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
  POD_NAME:       ingress-nginx-controller-jlj7q (v1:metadata.name)
  POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
  LD_PRELOAD:     /usr/local/lib/libmimalloc.so
Mounts:
  /usr/local/certificates/ from webhook-cert (ro)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j79cn (ro)
Conditions:
Type              Status
Initialized       True 
Ready             True 
ContainersReady   True 
PodScheduled      True 
Volumes:
webhook-cert:
Type:        Secret (a volume populated by a Secret)
SecretName:  ingress-nginx-admission
Optional:    false
kube-api-access-j79cn:
Type:                    Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds:  3607
ConfigMapName:           kube-root-ca.crt
ConfigMapOptional:       <nil>
DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
                         role=ingress
Tolerations:                 dedicated=ingress:NoSchedule
                         node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                         node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                         node.kubernetes.io/not-ready:NoExecute op=Exists
                         node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                         node.kubernetes.io/unreachable:NoExecute op=Exists
                         node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:                      <none>

kubectl -n dex describe ing dex

Name:             dex
Labels:           app=dex
          kustomize.toolkit.fluxcd.io/name=dex
          kustomize.toolkit.fluxcd.io/namespace=dex
Namespace:        dex
Address:          10.99.31.183
Ingress Class:    nginx
Default backend:  <default>
Rules:
Host                             Path  Backends
----                             ----  --------
auth.ccnp-mgt01.com.toto  
                           /   dex:5556 (172.16.11.254:5556,172.16.18.29:5556,172.16.21.229:5556)
Annotations:                       nginx.ingress.kubernetes.io/backend-protocol: HTTPS
                           nginx.ingress.kubernetes.io/ssl-passthrough: true
                           nginx.ingress.kubernetes.io/ssl-redirect: true
Events:                            <none>

How to reproduce this issue:

Create an ingress with ssl-passthrough:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  name: dex
  namespace: dex
spec:
  ingressClassName: nginx
  rules:
  - host: auth.ccnp-mgt01.com.toto
    http:
      paths:
      - backend:
          service:
            name: dex
            port:
              number: 5556
        path: /
        pathType: ImplementationSpecific

add the parameter: --disable-full-test to ingress-controller

after some time and config reload the issue appear

Anything else we need to know:

the issue seems to be present only with --disable-full-test parameter set. A restart of nginx reload all the config and solve the issue temporarly

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

longwuyuan commented 1 year ago

/remove-kind bug

this message

"passing to" hostport="127.0.0.1:442" ->

shows a loopback ipaddress and it sort of implies you want a TLS connection to the ingress-controller pod on its internal loopback interface. Does not make sense. Much more data is needed as to understand this.

Please try to add info here like ;

what was the curl request sent exact and complete with -v
what was the response to the curl request
what is the output of kubectl describe for the related ingress resource
what is the kubectl get all -n ingress-nginx look like
and other such info

/remove-kind bug /kind support

jfpucheu commented 1 year ago

"shows a loopback ipaddress and it sort of implies you want a TLS connection to the ingress-controller pod on its internal loopback interface" ---> that is the subject .... it is not what we want. it is SSL Passtrough set up , we should never see the request going to 127.0.0.1:442. But sometimes on some reloads it's append when --disable-full-test is set

longwuyuan commented 1 year ago

There are 2 aspects. If the state was in transit to being reconciled and a new connection for tls-passthrough was attempted, then this would be expected.

github-actions[bot] commented 11 months ago

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

rtikunov commented 9 months ago

Hit this bug also. In big clusters with big number of ingresses it takes ages to reload config without --disable-full-test. So this option used often. And really it breaks nginx.ingress.kubernetes.io/ssl-passthrough=true annotation.

longwuyuan commented 1 week ago

It is possible that large volume of change being done concurrently, will cause delay in reload and error during reload.

But the description of this issue is not enough to take any practical action on. We already know about the large sixe change reload performance. There is work in progress to mitigate that large size reload and security problem.

If this issue is only about large size reload breaking ssl-passthrough even then there is no action item here for the project because when the controller process is stuck, then many things will break and not only ssl-passthrough. Having more compute or memory at the time and on the node where the reload process is happening may be a temporary workaround. This is inherited from nginx as vanilla nginx also will reload delay if the number of changes is too bug and all concurrent.

Plain ssl-passthrough works without problems as even other software use it https://argo-cd.readthedocs.io/en/stable/operator-manual/ingress/#kubernetesingress-nginx .

Since there is no action-item tracking here in this issue I will close it because it is adding to the tally of open issues.

/close

k8s-ci-robot commented 1 week ago

@longwuyuan: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/10386#issuecomment-2350884622): >It is possible that large volume of change being done concurrently, will cause delay in reload and error during reload. > >But the description of this issue is not enough to take any practical action on. We already know about the large sixe change reload performance. There is work in progress to mitigate that large size reload and security problem. > >If this issue is only about large size reload breaking ssl-passthrough even then there is no action item here for the project because when the controller process is stuck, then many things will break and not only ssl-passthrough. Having more compute or memory at the time and on the node where the reload process is happening may be a temporary workaround. This is inherited from nginx as vanilla nginx also will reload delay if the number of changes is too bug and all concurrent. > >Plain ssl-passthrough works without problems as even other software use it https://argo-cd.readthedocs.io/en/stable/operator-manual/ingress/#kubernetesingress-nginx . > >Since there is no action-item tracking here in this issue I will close it because it is adding to the tally of open issues. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes / ingress-nginx

ssl-passthrough config lost for ingress on config reload when --disable-full-test is set #10386