kubernetes / ingress-nginx

Ingress NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.56k stars 8.27k forks source link

Controller resolves default-backend pod IP as DNS name if Ingress leads to ExternalName service #12173

Open meatuses opened 1 month ago

meatuses commented 1 month ago

What happened: If an ingress resource leads to service with type ExternalName, but also has annotation nginx.ingress.kubernetes.io/default-backend with the value set to a service with type ClusterIP, ingress-nginx-controller tries to resolve pod IP of said ClusterIP service as a DNS name. I have attached manifests down in Others section.

A lot of following errors are generated in ingress-nginx-controller logs. 10.111.0.170 is IP of a pod for default-backend service:

2024/10/14 11:55:01 [error] 908#908: *19967 [lua] dns.lua:152: dns_lookup(): failed to query the DNS server for 10.111.0.170:
server returned error code: 3: name error
server returned error code: 3: name error, context: ngx.timer

Seems that the ClusterIP service somehow matched with this condition https://github.com/kubernetes/ingress-nginx/blob/controller-v1.11.3/rootfs/etc/nginx/lua/tcp_udp_balancer.lua#L74-L78

What you expected to happen:

Ingress-nginx-controller does not try to resolve IP addresses as DNS names.

NGINX Ingress controller version v1.11.3

Kubernetes version: v1.27.16

Environment:

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/ingress-nginx-controller LoadBalancer 10.222.34.99 10.128.0.40 80:30830/TCP,443:31937/TCP 30m app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx service/ingress-nginx-controller-admission ClusterIP 10.222.155.122 443/TCP 30m app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR deployment.apps/ingress-nginx-controller 1/1 1 1 30m controller registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx

NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicaset.apps/ingress-nginx-controller-5979bb57db 1 1 1 30m controller registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx,pod-template-hash=5979bb57db

  - `kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>`

kubectl -n ingress-nginx describe pod ingress-nginx-controller-5979bb57db-s7wzm

Name: ingress-nginx-controller-5979bb57db-s7wzm Namespace: ingress-nginx Priority: 1000 Priority Class Name: develop Service Account: ingress-nginx Node: ob-ingress-nginx-test-0/10.128.0.31 Start Time: Mon, 14 Oct 2024 11:33:58 +0000 Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.11.3 helm.sh/chart=ingress-nginx-4.11.3 pod-template-hash=5979bb57db Annotations: Status: Running IP: 10.111.0.188 IPs: IP: 10.111.0.188 Controlled By: ReplicaSet/ingress-nginx-controller-5979bb57db Containers: controller: Container ID: containerd://93e8c789a844dfb2257501727b92726d5f49ff1de7bb48b3d20a0ea3ea09992a Image: registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7 Image ID: registry.k8s.io/ingress-nginx/controller@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7 Ports: 80/TCP, 443/TCP, 8443/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP SeccompProfile: RuntimeDefault Args: /nginx-ingress-controller --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller --election-id=ingress-nginx-leader --controller-class=k8s.io/ingress-nginx --ingress-class=nginx --configmap=$(POD_NAMESPACE)/ingress-nginx-controller --validating-webhook=:8443 --validating-webhook-certificate=/usr/local/certificates/cert --validating-webhook-key=/usr/local/certificates/key --enable-metrics=false State: Running Started: Mon, 14 Oct 2024 11:34:01 +0000 Ready: True Restart Count: 0 Requests: cpu: 100m memory: 90Mi Liveness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5 Readiness: http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAME: ingress-nginx-controller-5979bb57db-s7wzm (v1:metadata.name) POD_NAMESPACE: ingress-nginx (v1:metadata.namespace) LD_PRELOAD: /usr/local/lib/libmimalloc.so Mounts: /usr/local/certificates/ from webhook-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ftlkh (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: webhook-cert: Type: Secret (a volume populated by a Secret) SecretName: ingress-nginx-admission Optional: false kube-api-access-ftlkh: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 23m default-scheduler Successfully assigned ingress-nginx/ingress-nginx-controller-5979bb57db-s7wzm to ob-ingress-nginx-test-0 Normal Pulled 23m kubelet Container image "registry.k8s.io/ingress-nginx/controller:v1.11.3@sha256:d56f135b6462cfc476447cfe564b83a45e8bb7da2774963b00d12161112270b7" already present on machine Normal Created 23m kubelet Created container controller Normal Started 23m kubelet Started container controller Normal RELOAD 4m32s (x7 over 23m) nginx-ingress-controller NGINX reload triggered due to a change in configuration

  - `kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>`

kubectl -n ingress-nginx describe svc ingress-nginx-controller

Name: ingress-nginx-controller Namespace: ingress-nginx Labels: app.kubernetes.io/component=controller app.kubernetes.io/instance=ingress-nginx app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=ingress-nginx app.kubernetes.io/part-of=ingress-nginx app.kubernetes.io/version=1.11.3 helm.sh/chart=ingress-nginx-4.11.3 Annotations: meta.helm.sh/release-name: ingress-nginx meta.helm.sh/release-namespace: ingress-nginx metallb.universe.tf/ip-allocated-from-pool: frontend-pool Selector: app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 10.222.34.99 IPs: 10.222.34.99 LoadBalancer Ingress: 10.128.0.40 Port: http 80/TCP TargetPort: http/TCP NodePort: http 30830/TCP Endpoints: 10.111.0.188:80 Port: https 443/TCP TargetPort: https/TCP NodePort: https 31937/TCP Endpoints: 10.111.0.188:443 Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message


Normal IPAllocated 33m metallb-controller Assigned IP ["10.128.0.40"] Normal nodeAssigned 26m (x2 over 32m) metallb-speaker announcing from node "ob-ingress-nginx-test-0" with protocol "layer2"


- **Current state of ingress object, if applicable**:
  - `kubectl -n <appnamespace> get all,ing -o wide`

kubectl -n app get all,ing -owide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/nginx-errors 1/1 Running 1 (2d21h ago) 7d1h 10.111.0.170 ob-ingress-nginx-test-0

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/external-name-svc-test ExternalName www.google.com 22m service/nginx-errors ClusterIP 10.222.4.219 80/TCP 7d1h app=errors

NAME CLASS HOSTS ADDRESS PORTS AGE ingress.networking.k8s.io/external-name-ingress nginx static.test.com 10.128.0.40 80 7d1h

  - `kubectl -n <appnamespace> describe ing <ingressname>`

kubectl -n app describe ingress external-name-ingress

Name: external-name-ingress Labels: Namespace: app Address: 10.128.0.40 Ingress Class: nginx Default backend: Rules: Host Path Backends


static.test.com / external-name-svc-test:443 (<error: endpoints "external-name-svc-test" not found>) Annotations: nginx.ingress.kubernetes.io/backend-protocol: HTTPS nginx.ingress.kubernetes.io/custom-http-errors: 500 nginx.ingress.kubernetes.io/default-backend: nginx-errors nginx.ingress.kubernetes.io/preserve-host: false Events: Type Reason Age From Message


Normal Sync 30m (x3 over 33m) nginx-ingress-controller Scheduled for sync Normal Sync 7m55s (x6 over 27m) nginx-ingress-controller Scheduled for sync

  - If applicable, then, your complete and exact curl/grpcurl command (redacted if required) and the reponse to the curl/grpcurl command with the -v flag

- **Others**:
  - Any other related information like ;
    - copy/paste of the snippet (if applicable)
    - `kubectl describe ...` of any custom configmap(s) created and in use
    - Any other related information that may help

ingress yaml:

kubectl -n app get ingress external-name-ingress -oyaml

apiVersion: networking.k8s.io/v1 kind: Ingress metadata: annotations: nginx.ingress.kubernetes.io/backend-protocol: HTTPS nginx.ingress.kubernetes.io/custom-http-errors: "500" nginx.ingress.kubernetes.io/default-backend: nginx-errors nginx.ingress.kubernetes.io/preserve-host: "false" creationTimestamp: "2024-10-07T10:55:38Z" generation: 2 name: external-name-ingress namespace: app resourceVersion: "3361207" uid: 172919a8-0407-4b98-a11b-542db1538814 spec: ingressClassName: nginx rules:

ClusterIP of default-backend service connected to pod, which ingress-nginx tries to resolve as DNS:

# kubectl -n app get svc nginx-errors -oyaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2024-10-07T10:54:25Z"
  labels:
    service: nginx-errors
  name: nginx-errors
  namespace: app
  resourceVersion: "74017"
  uid: 373a9a24-4c53-4ae2-b83c-ffc5ea25a9c3
spec:
  clusterIP: 10.222.4.219
  clusterIPs:
  - 10.222.4.219
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: errors
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

the pod:

# kubectl -n app get pod -owide --show-labels
NAME           READY   STATUS    RESTARTS        AGE    IP             NODE                      NOMINATED NODE   READINESS GATES   LABELS
nginx-errors   1/1     Running   1 (2d21h ago)   7d1h   10.111.0.170   ob-ingress-nginx-test-0   <none>           <none>            app=errors

curl to ingress works (not sure why google returns 404 though):

# curl static.test.com -v
*   Trying 10.128.0.40:80...
* Connected to static.test.com (10.128.0.40) port 80 (#0)
> GET / HTTP/1.1
> Host: static.test.com
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 404 Not Found
< Date: Mon, 14 Oct 2024 12:14:29 GMT
< Content-Type: text/html; charset=UTF-8
< Content-Length: 1561
< Connection: keep-alive
< Referrer-Policy: no-referrer
< Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
<
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>404.</b> <ins>That’s an error.</ins>
  <p>The requested URL <code>/</code> was not found on this server.  <ins>That’s all we know.</ins>
* Connection #0 to host static.test.com left intact

How to reproduce this issue:

  1. Have working Kubernetes cluster
  2. Install ingress-nginx using Quick Start Helm
  3. Deploy ingress, service ExternalName, service ClusterIP + pod with manifests as in the above Others section.
  4. Check logs of ingress-nginx-controller, see dns_lookup(): failed to query the DNS server for 10.111.0.170 error.
k8s-ci-robot commented 1 month ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 1 month ago

Hi,

Another user has raised issue that has similarities with this one.

image

/kind feature /remove-kind bug

chessman commented 1 month ago

@longwuyuan There are two ways to define default-backend, the global backend and the annotation backend. They are not the same:

But your curl command hostname is understood by controller and that ExterNameService will never ever have a endpoint. So there is no design/code to handle this use-case of default-backend See below screenshot

If the nginx.ingress.kubernetes.io/custom-http-errors annotation is specified (it is specified in this case), then the annotation default-backend will handle HTTP errors coming from the service.

I already hinted why this case is not working https://github.com/kubernetes/ingress-nginx/issues/12158#issuecomment-2407367484. This PR fixes it https://github.com/kubernetes/ingress-nginx/pull/12160.

meatuses commented 1 month ago

First its required to state that default-backend is for requests that controller does not understand. But your curl command hostname is understood by controller and that ExterNameService will never ever have a endpoint.

The issue here is:

In my curl request I was not trying to trigger an error to lead me to default-backend. Because it's not needed to see the issue. These logs are generated at a rate of around 4 events per second, without any requests made to the ingress:

2024-10-14T14:27:21.097969668Z 2024/10/14 14:27:21 [error] 908#908: *170252 [lua] dns.lua:152: dns_lookup(): failed to query the DNS server for 10.111.0.170:
2024-10-14T14:27:21.098230504Z server returned error code: 3: name error
2024-10-14T14:27:21.098256945Z server returned error code: 3: name error, context: ngx.timer
2024-10-14T14:27:21.146672207Z 2024/10/14 14:27:21 [error] 909#909: *170257 [lua] dns.lua:152: dns_lookup(): failed to query the DNS server for 10.111.0.170:
2024-10-14T14:27:21.146714162Z server returned error code: 3: name error
2024-10-14T14:27:21.146719921Z server returned error code: 3: name error, context: ngx.timer
2024-10-14T14:27:21.519411144Z 2024/10/14 14:27:21 [error] 911#911: *170262 [lua] dns.lua:152: dns_lookup(): failed to query the DNS server for 10.111.0.170:
2024-10-14T14:27:21.519443371Z server returned error code: 3: name error
2024-10-14T14:27:21.519449218Z server returned error code: 3: name error, context: ngx.timer
2024-10-14T14:27:21.547436939Z 2024/10/14 14:27:21 [error] 910#910: *170267 [lua] dns.lua:152: dns_lookup(): failed to query the DNS server for 10.111.0.170:
2024-10-14T14:27:21.547480708Z server returned error code: 3: name error
2024-10-14T14:27:21.547486514Z server returned error code: 3: name error, context: ngx.timer
longwuyuan commented 1 month ago

image

meatuses commented 1 month ago

DNS servers that resolve my ExternalName

@longwuyuan I think I had empathised enough that the issue is not with ExternalName service itself, it resolves fine. The issue is: if you have default-backend service with custom-http-errors set in your ingress, controller tries to resolve IP of a pod that was linked with default-backend service as if it was a DNS name. I think it's clear from log records I have provided.

Looking at resources at your screenshots, I think if you add nginx.ingress.kubernetes.io/custom-http-errors: "500" annotation to your ingress (you can input any error codes you like), the issues will appear in your controller's logs. Without any curl/other requests to your ingress or trying to trigger these http errors in the externalname.

Also, disconnecting DNS servers or anything related to DNS settings is not related to the issue.

longwuyuan commented 1 month ago

@meatuses thank you for your update. It helps. I will try now and update

longwuyuan commented 1 month ago

*_2024/10/14 17:28:49 [error] 1644#1644: 157646 [lua] dns.lua:152: dnslookup(): failed to query the DNS server for 10.244.0.26: server returned error code: 3: name error server returned error code: 3: name error, context: ngx.timer**

The config looks like the screenshot below ;

image