Ingress status address is not updated when using "headless" service without hostNetwork

kevincox commented 2 years ago

What happened: When creating a headless service only one IP is assigned to managed Ingresses. See the example service:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: nginx-ingress
spec:
  clusterIP: None
  selector:
    app: nginx
  ports:
  - name: http
    port: 8080
    targetPort: 8080
  - name: https
    port: 8443
    targetPort: 8443

This assigns only a single IP to the instance.

However switching to a NodePort service (even though the NodePort is never actually used so this is really just a big hack) correctly assigns an IP for each nginx instance running.

apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: nginx-ingress
spec:
  type: NodePort
  selector:
    app: nginx
  ports:
  - name: http
    port: 8080
    targetPort: 8080
  - name: https
    port: 8443
    targetPort: 8443

What you expected to happen:

When using a headless service nginx ingress shuold assign one IP for each nginx instance running.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.2.1
  Build:         08848d69e0c83992c89da18e70ea708752f21d7a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.7", GitCommit:"42c05a547468804b2053ecf60a3bd15560362fc2", GitTreeState:"archive", BuildDate:"1980-01-01T00:00:00Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.8-gke.201", GitCommit:"2dca91e5224568a093c27d3589aa0a96fd3ddc9a", GitTreeState:"clean", BuildDate:"2022-05-11T18:39:02Z", GoVersion:"go1.16.14b7", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: DigitalOcean
OS (e.g. from /etc/os-release): Linux
Kernel (e.g. uname -a):
Install tools:
- Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
Basic cluster related info:
- kubectl version
- kubectl get nodes -o wide

NAME                         STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP       OS-IMAGE                       KERNEL-VERSION         CONTAINER-RUNTIME
pool-kube2-0ns3ff1ko-c2geg   Ready    <none>   5d23h   v1.22.8   10.137.0.5    138.197.148.239   Debian GNU/Linux 10 (buster)   5.10.0-0.bpo.9-amd64   containerd://1.4.13
pool-kube2-0ns3ff1ko-cw0yk   Ready    <none>   37d     v1.22.8   10.137.0.4    143.110.221.216   Debian GNU/Linux 10 (buster)   5.10.0-0.bpo.9-amd64   containerd://1.4.13

How was the ingress-nginx-controller installed:
- If helm was used then please show output of helm ls -A | grep -i ingress
- If helm was used then please show output of helm -n <ingresscontrollernamepspace> get values <helmreleasename>
- If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used

Creation

```yaml --- apiVersion: v1 items: - apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: annotations: {} name: disruption-budget namespace: nginx-ingress spec: maxUnavailable: 1 selector: matchLabels: app: nginx - apiVersion: v1 kind: Namespace metadata: annotations: {} name: nginx-ingress namespace: nginx-ingress - apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: annotations: {} name: default namespace: nginx-ingress rules: - apiGroups: - "" resourceNames: [] resources: - namespaces verbs: - get - apiGroups: - "" resourceNames: [] resources: - pods - secrets - endpoints verbs: - get - list - watch - apiGroups: - "" resourceNames: [] resources: - services verbs: - get - list - watch - apiGroups: - networking.k8s.io resourceNames: [] resources: - ingresses - ingressclasses verbs: - get - list - watch - apiGroups: - networking.k8s.io resourceNames: [] resources: - ingresses/status verbs: - update - apiGroups: - "" resourceNames: [] resources: - configmaps verbs: - create - get - list - update - watch - apiGroups: - "" resourceNames: [] resources: - events verbs: - create - patch - apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: role-binding-nginx-ingress-default namespace: nginx-ingress roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: default subjects: - kind: ServiceAccount name: default namespace: nginx-ingress - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: {} name: nginx-ingress-clusterrole-default namespace: nginx-ingress rules: - apiGroups: - "" resourceNames: [] resources: - configmaps - endpoints - nodes - pods - secrets verbs: - list - watch - apiGroups: - "" resourceNames: [] resources: - nodes verbs: - get - apiGroups: - "" resourceNames: [] resources: - services verbs: - get - list - watch - apiGroups: - "" resourceNames: [] resources: - events verbs: - create - patch - apiGroups: - networking.k8s.io resourceNames: [] resources: - ingresses - ingressclasses verbs: - get - list - watch - apiGroups: - networking.k8s.io resourceNames: [] resources: - ingresses/status verbs: - update - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: role-binding-nginx-ingress-default namespace: nginx-ingress roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: nginx-ingress-clusterrole-default subjects: - kind: ServiceAccount name: default namespace: nginx-ingress - apiVersion: apps/v1 kind: DaemonSet metadata: annotations: {} name: nginx namespace: nginx-ingress spec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - args: - /nginx-ingress-controller - "--configmap=nginx-ingress/nginx-configuration" - "--controller-class=k8s.io/ingress-nginx" - "--election-id=ingress-controller-leader" - "--http-port=8080" - "--https-port=8443" - "--watch-ingress-without-class=true" env: - name: LD_PRELOAD value: /usr/local/lib/libmimalloc.so - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: RELOAD value: "1" image: "k8s.gcr.io/ingress-nginx/controller:v1.2.1@sha256:5516d103a9c2ecc4f026efbd4b40662ce22dc1f824fb129ed121460aaa5c47f8" lifecycle: preStop: exec: command: - /wait-shutdown livenessProbe: failureThreshold: 2 httpGet: path: /healthz port: 10254 scheme: HTTP periodSeconds: 2 successThreshold: 1 timeoutSeconds: 1 name: nginx ports: - containerPort: 8080 hostPort: 80 - containerPort: 8443 hostPort: 443 resources: requests: cpu: 100m memory: 100Mi securityContext: allowPrivilegeEscalation: true capabilities: add: - NET_BIND_SERVICE drop: - ALL privileged: false readOnlyRootFilesystem: false runAsNonRoot: true runAsUser: 101 startupProbe: failureThreshold: 60 httpGet: path: /healthz port: 10254 scheme: HTTP periodSeconds: 2 successThreshold: 1 timeoutSeconds: 1 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: [] restartPolicy: Always tolerations: [] volumes: [] - apiVersion: v1 kind: Service metadata: labels: external-dns: "true" name: nginx namespace: nginx-ingress spec: ports: - name: http port: 8080 - name: https port: 8443 selector: app: nginx type: NodePort - apiVersion: v1 data: allow-snippet-annotations: "true" compute-full-forwarded-for: "true" use-forwarded-headers: "true" kind: ConfigMap metadata: annotations: {} name: nginx-configuration namespace: nginx-ingress kind: List metadata: {} ```

NAME            CLASS    HOSTS          ADDRESS                           PORTS     AGE
example-prod   <none>   example.org   138.197.148.239   80, 443   255d

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
  name: example-prod
  namespace: example
spec:
  defaultBackend:
    service:
      name: example-prod
      port:
        number: 80
  rules:
  - host: feedmail.org
    http:
      paths:
      - backend:
          service:
            name: example-prod
            port:
              number: 80
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - example.org
    secretName: tls
status:
  loadBalancer:
    ingress:
    - ip: 138.197.148.239

k8s-ci-robot commented 2 years ago

@kevincox: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

longwuyuan commented 2 years ago

/remove-kind bug

Post information that is the live state of cluster and objects. For example ;

kubectl get po,svc,ing -A -o wide
kubectl describe ing -A
Show how you installed the ingress-NGINX-Controller exactly and complete command etc, along with values file if any
kubectl describe -n

kevincox commented 2 years ago

kubectl get po,svc,ing -A -o wide

That is sensitive. If there are any relevant resrouces that I missed please let me know.

kubectl describe ing -A

I included the relevant ingress. If you think other ingresses may be relevant please be more specific.

Show how you installed the ingress-NGINX-Controller exactly and complete command etc, along with values file if any

See the original message.

kubectl describe -n

error: flag needs an argument: 'n' in -n

kevincox commented 2 years ago

Actually it appears that this is slightly more complicated. It appears that when using a headless service that nginx-ingress simply doesn't update the endpoints at all. It appears as though in my previous testing there simply happened to only be one address before the change.

So maybe the issue here is just that headless services aren't supported in general.

longwuyuan commented 2 years ago

Since you can't provide that info, maybe you can write a step-by-step procedure from beginning till end, with complete helpful description, that someone can just copy/paste on a minikube/kind cluster and reproduce the problem.

Its not easy to comment on such issues, without having the info like the current live state

kevincox commented 2 years ago

A step-by-step procedure of what? Applying the provided kube resources and ingress should get you a working example of this problem.

kevincox commented 2 years ago

Following the instructions here https://kubernetes.github.io/ingress-nginx/deploy/baremetal/#via-the-host-network seems to work. However this also exposes the metrics url (/metrics) and maybe other debugging information to the world which isn't really acceptable.

kevincox commented 2 years ago

It is unclear why that works but using hostPort in the container doesn't. The latter seems ideal because it allows exposing only specific ports.

kevincox commented 2 years ago

So after investiagtion on what the options are here. The core of the problem appears to be that we have three similar setups and one doesn't work.

nginx-ingress running with hostPort: 80 and a "fake" NodePort service works as expected.
nginx-ingress running with hostNetwork seems to work with either a fake NodePort or a headless ClusterIP service.
nginx-ingress running with hostPort: 80 and a headless ClusterIP service doesn't work.

This is quite unfortunate because the two that do work have downsides:

This requires a fake NodePort service which both uses extra ports but also entails extra proxying for some cases.
This exposes unwanted ports such as metrcs, profiling and lua ports to the public.

So it would be nice if option 3 worked because it is the "most correct" and doesn't expose extra endpoints publicly.

IDK if this is a bug or a feature request but it would be nice to fix this either way. I have updated the title to match the new investigation.

gauravkghildiyal commented 2 years ago

@kevincox When you say options 1 and 2 work, but 3 doesn't, can you clarify what do you mean by "it works"? Does "it works" mean that you are able to send requests to some address and get a response? If yes, for all the three options, can you describe what is that "address" where you are making the requests to:

is it the ClusterIP of the ingress pod
is it the Node IP (in combination with a Node Port)
or is it the "NAME" of the service which exposes the ingress pods (which would mean that it internally uses the kube-dns to resolve that service name)

kevincox commented 2 years ago

"Works" means the Ingress has endpoints for the nginx pods.

I am making requests to the nginx service to the external IPs of the nodes specified in the Ingress status (via external-dns). So I think that corresponds to your second option. "the Node IP"

gauravkghildiyal commented 2 years ago

Thanks. Can you share the Service and Deployment yamls for the configuration that does not work.

Also, since you claim to be making request to the Node IPs directly, why do you need the extra k8s nginx-ingress resource to begin with? The way I see it, you can just:

Get rid of the service resource
Change the nginx deployment to a daemon set (which you seem to have already done)
Configure the nginx container specs to use hostPorts
And then simply make requests using NODE_IP:HOST_PORT combination.

kevincox commented 2 years ago

You mean the Service and Deployment of a service that isn't being given the right endpoints? See below, but I think it is really basic.

Application Configs

Slightly trimmed for privacy, but if anything important is lost I can include more. ```yaml apiVersion: v1 items: - apiVersion: v1 kind: Service metadata: labels: app-env: feedmail-prod app-ver: feedmail-a33e8c2b9a596373ed85f8cd707a26317bb77195 name: feedmail-prod namespace: feedmail spec: clusterIP: 10.245.181.111 clusterIPs: - 10.245.181.111 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: http port: 80 protocol: TCP targetPort: http selector: app-env: feedmail-prod sessionAffinity: None type: ClusterIP status: loadBalancer: {} - apiVersion: apps/v1 kind: Deployment metadata: generation: 311 labels: app-env: feedmail-prod app-ver: feedmail-a33e8c2b9a596373ed85f8cd707a26317bb77195 name: feedmail-prod namespace: feedmail resourceVersion: "138418484" uid: 2d0904ea-65d5-4e73-8405-3519d18731bd spec: progressDeadlineSeconds: 900 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: app-env: feedmail-prod strategy: rollingUpdate: maxSurge: 5% maxUnavailable: 0 type: RollingUpdate template: metadata: labels: app-env: feedmail-prod app-ver: feedmail-a33e8c2b9a596373ed85f8cd707a26317bb77195 namespace: feedmail spec: containers: - livenessProbe: failureThreshold: 2 httpGet: path: /ping port: http scheme: HTTP periodSeconds: 1 successThreshold: 1 timeoutSeconds: 1 name: main ports: - containerPort: 8000 name: http protocol: TCP resources: requests: cpu: 100m memory: 100Mi startupProbe: failureThreshold: 60 httpGet: path: /ping port: http scheme: HTTP periodSeconds: 1 successThreshold: 1 timeoutSeconds: 1 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always - apiVersion: networking.k8s.io/v1 kind: Ingress metadata: annotations: labels: app-env: feedmail-prod app-ver: feedmail-a33e8c2b9a596373ed85f8cd707a26317bb77195 external-dns: "true" name: feedmail-prod namespace: feedmail spec: rules: - host: feedmail.org http: paths: - backend: service: name: feedmail-prod port: name: http path: / pathType: Prefix tls: - hosts: - feedmail.org - '*.feedmail.org' secretName: tls status: loadBalancer: ingress: - ip: 138.197.148.239 - ip: 143.110.221.216 kind: List metadata: resourceVersion: "" selfLink: "" ``` Note this is with a working config, so it got the right IPs.

The way I see it, you can just:

That is a possible workaround but has a few downsides:

During updates of nginx they aren't removed from the endpoints. This means failed requests during deployments. This is especially bad if the new config is broken and they stay down.
This means that you need to run nginx on every node. I do this right now because my cluster is small but for larger clusters this is a waste of resources.

gauravkghildiyal commented 2 years ago

You mean the Service and Deployment of a service that isn't being given the right endpoints? See below, but I think it is really basic.

Thanks for sharing. One correction there though - I actually wanted to the ingress-controller deployment and not the actual backend deployment. You can skip that since I feel the difference in our understanding lies somewhere else which I hope to clarify with some questions below.

Can you elaborate the 1st point of the down side? What does "endpoints" mean here. Do you mean the list of ADDRESSESs that you when you do kubectl get ingress?
What purpose do you think having a headless ingress-nginx service should solve for you?

I did read "When using a headless service nginx ingress shuold assign one IP for each nginx instance running." but this is still a bit unclear to me. A headless service is basically useless when it comes to load balancing. So if you had two ingress-nginx pods - you can simply not use the service to automatically load balance between them. You could only use the headless service to identify the Cluster IPs of the two ingress pods and they you will have to load balance between them yourself.
I see you've mentioned multiple times about "assigning IP to Ingresses"
- When creating a headless service only one IP is assigned to managed Ingresses
- However switching to a NodePort service (even though the NodePort is never actually used so this is really just a big hack) correctly assigns an IP for each nginx instance running.
Which IP assignment are you talking about here. Again, lets say two IP addresses were "assigned" to whatever you want them to be assigned to. What do you hope to achieve with those two IP addresses being assigned to that single resource.

kevincox commented 2 years ago

The ingress-controller deployment is in the original post under the "How was the ingress-nginx-controller installed" question.

Do you mean the list of ADDRESSESs that you when you do kubectl get ingress?

Yes. To be precise I mean the status of the ingress.

status:
  loadBalancer:
    ingress:
    - ip: 138.197.148.239
    - ip: 143.110.221.216

What purpose do you think having a headless ingress-nginx service should solve for you?

It allows me to get nginx to update the statuses correctly without a useless cluster IP or node port.

A headless service is basically useless when it comes to load balancing.

In my case essentially random shuffling of the IPs is sufficient. I am not using the service directly but it affects how nginx assigns IPs to Ingress statuses.

Which IP assignment are you talking about here.

I think this is the same as the first question. The status.loadBalancer info.

gauravkghildiyal commented 2 years ago

Thanks. Now I have a clear picture. So you just want the ingress status to be updated with the external IPs of all the nodes where the nginx-ingress-controller pods are running.

Sadly, I wasn't able to reproduce this problem. Infact, even if I delete the ingress-nginx service, I still see the ingress status updated with the correct external IPs.

Although I don't see these issues in the yamls you shared, but, you should ensure that:

You DON'T pass the flag --publish-service <NAME_OF_SERVICE> to the ingress controller
The value of the --update-status flag, if passed, should be true (--update-status=true). Although the default value of this flag is already true, it doesn't hurt to give this a try.

Also, maybe you can take a look (and possible share - although I'm not sure how helpful it would be) the logs for the nginx-ingress-controller pod to possibly check if there were any issues there. While updating the status, it should log something like:

status.go:299] "updating Ingress status" namespace="default" ingress="ingress-2" currentValue=[] newValue=[{IP:<EXTERNAL_IP> Hostname: Ports:[]} {IP:<EXTERNAL_IP> Hostname: Ports:[]}]

Aside from this, I understand that you are trying to use the External IPs of the nodes in conjunction with something like hostNetwork or hostPort to access the nginx-ingress from outside your cluster. You might already know this, but still a word of caution regarding the usage of hostNetwork or hostPort - things tend to get weird with these.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/8707#issuecomment-1328101824): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes / ingress-nginx

Ingress status address is not updated when using "headless" service without hostNetwork #8707