kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.23k stars 8.2k forks source link

Reoccurrence of Service does not have any active Endpoint [when it actually does] #9932

Open scott-kausler opened 1 year ago

scott-kausler commented 1 year ago

What happened: The ingress controller reported that the "Service does not have any active Endpoint" when in fact the service did have active endpoints.

I was able to verify the service was active by execing into the nginx pod and curling the health check endpoint of the service.

The only way I was able to recover was to reinstall the helm chart.

What you expected to happen:

The service to be added to ingress controller

NGINX Ingress controller version:

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.6.4
  Build:         69e8833858fb6bda12a44990f1d5eaa7b13f4b75
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"25+", GitVersion:"v1.25.6-eks-48e63af", GitCommit:"9f22d4ae876173884749c0701f01340879ab3f95", GitTreeState:"clean", BuildDate:"2023-01-24T19:19:02Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}

Environment: AWS EKS

Server Version: version.Info{Major:"1", Minor:"25+", GitVersion:"v1.25.6-eks-48e63af", GitCommit:"9f22d4ae876173884749c0701f01340879ab3f95", GitTreeState:"clean", BuildDate:"2023-01-24T19:19:02Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}

How was the ingress-nginx-controller installed: nginx nginx 1 2023-05-06 16:52:09.643618809 +0000 UTC deployed ingress-nginx-4.5.2 1.6.4

Values:

  ingressClassResource:
    default: true
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-type: nlb

How to reproduce this issue: Unknown. There was a single replica of the pod, and it was deployed for 42 days before exhibiting this problem.

However, others have recently reported this issue in https://github.com/kubernetes/ingress-nginx/issues/6135.

Anything else we need to know:

The problem was previously reported in https://github.com/kubernetes/ingress-nginx/issues/6135, but the defect was closed.

k8s-ci-robot commented 1 year ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
longwuyuan commented 1 year ago

/remove-kind bug

Hi, this has been reported twice and this related to the change where the endpointslice is being used.

This Issue in itself, in its current state, does not contain enough data to hint at a action item. It would help a lot if there you write a step-by-step instruction to copy/paste and reproduce the problem on a minikube cluster or a kind cluster.

It is also likely that there could be a reason, so far unknown, as to why the endpoint slice does not get populated. Even for that it becomes more important to know a way to reproduce the problem and debug it (because just creating a workload with something like the image nginx:alpine, does not create this problem). Thanks

tombokombo commented 1 year ago

@scott-kausler please provide kubectl -n $ns get svc,ing,ep,endpointslice and kubectl -n $ns get svc,ing,ep,endpointslice -o yaml

rdb0101 commented 1 year ago

Hi I am having the same issue as reported in this ticket. I initially created a ticket under rancher Issue 41584 as I wasn't sure if it is a rancher issue or isolated to just the kubernetes ingress-nginx related issue. Is it possible to provide some insight as to why this can be happening?

rdb0101 commented 1 year ago

Every 25 to 45 minutes one service is available but then during the next interval the Rancher GUI becomes unavailable "404 page not found"; "service "rancher Service does not have an active endpoint" error.

rdb0101 commented 1 year ago

Hi @tombokombo I ran the commands as recommended; please refer to the output below:

apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rke2-coredns
      meta.helm.sh/release-namespace: kube-system
    creationTimestamp: "2023-05-02T17:29:46Z"
    labels:
      app.kubernetes.io/instance: rke2-coredns
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-coredns
      helm.sh/chart: rke2-coredns-1.19.402
      k8s-app: kube-dns
      kubernetes.io/cluster-service: "true"
      kubernetes.io/name: CoreDNS
    name: rke2-coredns-rke2-coredns
    namespace: kube-system
    resourceVersion: "668"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: udp-53
      port: 53
      protocol: UDP
      targetPort: 53
    - name: tcp-53
      port: 53
      protocol: TCP
      targetPort: 53
    selector:
      app.kubernetes.io/instance: rke2-coredns
      app.kubernetes.io/name: rke2-coredns
      k8s-app: kube-dns
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1I
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rke2-ingress-nginx
      meta.helm.sh/release-namespace: kube-system
    creationTimestamp: "2023-05-02T17:30:20Z"
    labels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: rke2-ingress-nginx
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-ingress-nginx
      app.kubernetes.io/part-of: rke2-ingress-nginx
      app.kubernetes.io/version: 1.6.4
      helm.sh/chart: rke2-ingress-nginx-4.5.201
    name: rke2-ingress-nginx-controller-admission
    namespace: kube-system
    resourceVersion: "1183"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - appProtocol: https
      name: https-webhook
      port: 443
      protocol: TCP
      targetPort: webhook
    selector:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: rke2-ingress-nginx
      app.kubernetes.io/name: rke2-ingress-nginx
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rke2-metrics-server
      meta.helm.sh/release-namespace: kube-system
    creationTimestamp: "2023-05-02T17:30:09Z"
    labels:
      app: rke2-metrics-server
      app.kubernetes.io/managed-by: Helm
      chart: rke2-metrics-server-2.11.100-build2022101107
      heritage: Helm
      release: rke2-metrics-server
    name: rke2-metrics-server
    namespace: kube-system
    resourceVersion: "5197581"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: https
    - name: metrics
      port: 10250
      protocol: TCP
      targetPort: 10250
    selector:
      app: rke2-metrics-server
      release: rke2-metrics-server
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rke2-snapshot-validation-webhook
      meta.helm.sh/release-namespace: kube-system
    creationTimestamp: "2023-05-02T17:30:10Z"
    labels:
      app.kubernetes.io/instance: rke2-snapshot-validation-webhook
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-snapshot-validation-webhook
      app.kubernetes.io/version: v6.2.1
      helm.sh/chart: rke2-snapshot-validation-webhook-1.7.100
    name: rke2-snapshot-validation-webhook
    namespace: kube-system
    resourceVersion: "980"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: https
    selector:
      app.kubernetes.io/instance: rke2-snapshot-validation-webhook
      app.kubernetes.io/name: rke2-snapshot-validation-webhook
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-19T10:05:33Z"
    creationTimestamp: "2023-05-02T17:29:46Z"
    labels:
      app.kubernetes.io/instance: rke2-coredns
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-coredns
      helm.sh/chart: rke2-coredns-1.19.402
      k8s-app: kube-dns
      kubernetes.io/cluster-service: "true"
      kubernetes.io/name: CoreDNS
    name: rke2-coredns-rke2-coredns
    namespace: kube-system
    resourceVersion: "5534372"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-coredns-rke2-coredns-6b9548f79f-fg2th
        namespace: kube-system
        uid: REDACTED
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-coredns-rke2-coredns-6b9548f79f-n4p5l
        namespace: kube-system
        uid: REDACTED
    ports:
    - name: tcp-53
      port: 53
      protocol: TCP
    - name: udp-53
      port: 53
      protocol: UDP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-19T10:05:23Z"
    creationTimestamp: "2023-05-02T17:30:20Z"
    labels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: rke2-ingress-nginx
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-ingress-nginx
      app.kubernetes.io/part-of: rke2-ingress-nginx
      app.kubernetes.io/version: 1.6.4
      helm.sh/chart: rke2-ingress-nginx-4.5.201
    name: rke2-ingress-nginx-controller-admission
    namespace: kube-system
    resourceVersion: "5534140"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-ingress-nginx-controller-2h95m
        namespace: kube-system
        uid: REDACTED
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-ingress-nginx-controller-8hvtl
        namespace: kube-system
        uid: REDACTED
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-ingress-nginx-controller-c8x24
        namespace: kube-system
        uid: REDACTED
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-ingress-nginx-controller-df4lk
        namespace: kube-system
        uid: REDACTED
    ports:
    - appProtocol: https
      name: https-webhook
      port: 8443
      protocol: TCP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-02T17:30:09Z"
    creationTimestamp: "2023-05-02T17:30:09Z"
    labels:
      app: rke2-metrics-server
      app.kubernetes.io/managed-by: Helm
      chart: rke2-metrics-server-2.11.100-build2022101107
      heritage: Helm
      release: rke2-metrics-server
    name: rke2-metrics-server
    namespace: kube-system
    resourceVersion: "5533133"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-metrics-server-7d58bbc9c6-xvgg8
        namespace: kube-system
        uid: REDACTED
    ports:
    - name: metrics
      port: 10250
      protocol: TCP
    - name: https
      port: 10250
      protocol: TCP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-02T17:30:10Z"
    creationTimestamp: "2023-05-02T17:30:10Z"
    labels:
      app.kubernetes.io/instance: rke2-snapshot-validation-webhook
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-snapshot-validation-webhook
      app.kubernetes.io/version: v6.2.1
      helm.sh/chart: rke2-snapshot-validation-webhook-1.7.100
    name: rke2-snapshot-validation-webhook
    namespace: kube-system
    resourceVersion: "5533131"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-snapshot-validation-webhook-7748dbf6ff-xdtm2
        namespace: kube-system
        uid: REDACTED
    ports:
    - name: https
      port: 8443
      protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-coredns-rke2-coredns-6b9548f79f-fg2th
      namespace: kube-system
      uid: REDACTED
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-coredns-rke2-coredns-6b9548f79f-n4p5l
      namespace: kube-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-19T10:05:33Z"
    creationTimestamp: "2023-05-02T17:29:46Z"
    generateName: rke2-coredns-rke2-coredns-
    generation: 78
    labels:
      app.kubernetes.io/instance: rke2-coredns
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-coredns
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      helm.sh/chart: rke2-coredns-1.19.402
      k8s-app: kube-dns
      kubernetes.io/cluster-service: "true"
      kubernetes.io/name: CoreDNS
      kubernetes.io/service-name: rke2-coredns-rke2-coredns
    name: rke2-coredns-rke2-coredns-d7srf
    namespace: kube-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rke2-coredns-rke2-coredns
      uid: REDACTED
    resourceVersion: "5534370"
    uid: REDACTED
  ports:
  - name: tcp-53
    port: 53
    protocol: TCP
  - name: udp-53
    port: 53
    protocol: UDP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-ingress-nginx-controller-2h95m
      namespace: kube-system
      uid: REDACTED
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-ingress-nginx-controller-c8x24
      namespace: kube-system
      uid: REDACTED
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-ingress-nginx-controller-df4lk
      namespace: kube-system
      uid: REDACTED
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-ingress-nginx-controller-8hvtl
      namespace: kube-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-19T10:05:23Z"
    creationTimestamp: "2023-05-02T17:30:20Z"
    generateName: rke2-ingress-nginx-controller-admission-
    generation: 265
    labels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: rke2-ingress-nginx
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-ingress-nginx
      app.kubernetes.io/part-of: rke2-ingress-nginx
      app.kubernetes.io/version: 1.6.4
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      helm.sh/chart: rke2-ingress-nginx-4.5.201
      kubernetes.io/service-name: rke2-ingress-nginx-controller-admission
    name: rke2-ingress-nginx-controller-admission-g25cm
    namespace: kube-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rke2-ingress-nginx-controller-admission
      uid: REDACTED
    resourceVersion: "5534139"
    uid: REDACTED
  ports:
  - appProtocol: https
    name: https-webhook
    port: 8443
    protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-metrics-server-7d58bbc9c6-xvgg8
      namespace: kube-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-02T17:30:09Z"
    creationTimestamp: "2023-05-02T17:30:09Z"
    generateName: rke2-metrics-server-
    generation: 27
    labels:
      app: rke2-metrics-server
      app.kubernetes.io/managed-by: Helm
      chart: rke2-metrics-server-2.11.100-build2022101107
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      heritage: Helm
      kubernetes.io/service-name: rke2-metrics-server
      release: rke2-metrics-server
    name: rke2-metrics-server-wmz2b
    namespace: kube-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rke2-metrics-server
      uid: REDACTED
    resourceVersion: "5533128"
    uid: REDACTED
  ports:
  - name: metrics
    port: 10250
    protocol: TCP
  - name: https
    port: 10250
    protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-snapshot-validation-webhook-7748dbf6ff-xdtm2
      namespace: kube-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-02T17:30:10Z"
    creationTimestamp: "2023-05-02T17:30:10Z"
    generateName: rke2-snapshot-validation-webhook-
    generation: 16
    labels:
      app.kubernetes.io/instance: rke2-snapshot-validation-webhook
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-snapshot-validation-webhook
      app.kubernetes.io/version: v6.2.1
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      helm.sh/chart: rke2-snapshot-validation-webhook-1.7.100
      kubernetes.io/service-name: rke2-snapshot-validation-webhook
    name: rke2-snapshot-validation-webhook-mzc9v
    namespace: kube-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rke2-snapshot-validation-webhook
      uid: REDACTED
    resourceVersion: "5533125"
    uid: REDACTED
  ports:
  - name: https
    port: 8443
    protocol: TCP
kind: List
metadata:
  resourceVersion: ""
---
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rke2-coredns
      meta.helm.sh/release-namespace: kube-system
    creationTimestamp: "2023-05-02T17:29:46Z"
    labels:
      app.kubernetes.io/instance: rke2-coredns
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-coredns
      helm.sh/chart: rke2-coredns-1.19.402
      k8s-app: kube-dns
      kubernetes.io/cluster-service: "true"
      kubernetes.io/name: CoreDNS
    name: rke2-coredns-rke2-coredns
    namespace: kube-system
    resourceVersion: "668"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: udp-53
      port: 53
      protocol: UDP
      targetPort: 53
    - name: tcp-53
      port: 53
      protocol: TCP
      targetPort: 53
    selector:
      app.kubernetes.io/instance: rke2-coredns
      app.kubernetes.io/name: rke2-coredns
      k8s-app: kube-dns
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rke2-ingress-nginx
      meta.helm.sh/release-namespace: kube-system
    creationTimestamp: "2023-05-02T17:30:20Z"
    labels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: rke2-ingress-nginx
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-ingress-nginx
      app.kubernetes.io/part-of: rke2-ingress-nginx
      app.kubernetes.io/version: 1.6.4
      helm.sh/chart: rke2-ingress-nginx-4.5.201
    name: rke2-ingress-nginx-controller-admission
    namespace: kube-system
    resourceVersion: "1183"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - appProtocol: https
      name: https-webhook
      port: 443
      protocol: TCP
      targetPort: webhook
    selector:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: rke2-ingress-nginx
      app.kubernetes.io/name: rke2-ingress-nginx
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rke2-metrics-server
      meta.helm.sh/release-namespace: kube-system
    creationTimestamp: "2023-05-02T17:30:09Z"
    labels:
      app: rke2-metrics-server
      app.kubernetes.io/managed-by: Helm
      chart: rke2-metrics-server-2.11.100-build2022101107
      heritage: Helm
      release: rke2-metrics-server
    name: rke2-metrics-server
    namespace: kube-system
    resourceVersion: "5197581"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: https
    - name: metrics
      port: 10250
      protocol: TCP
      targetPort: 10250
    selector:
      app: rke2-metrics-server
      release: rke2-metrics-server
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rke2-snapshot-validation-webhook
      meta.helm.sh/release-namespace: kube-system
    creationTimestamp: "2023-05-02T17:30:10Z"
    labels:
      app.kubernetes.io/instance: rke2-snapshot-validation-webhook
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-snapshot-validation-webhook
      app.kubernetes.io/version: v6.2.1
      helm.sh/chart: rke2-snapshot-validation-webhook-1.7.100
    name: rke2-snapshot-validation-webhook
    namespace: kube-system
    resourceVersion: "980"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: https
    selector:
      app.kubernetes.io/instance: rke2-snapshot-validation-webhook
      app.kubernetes.io/name: rke2-snapshot-validation-webhook
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-19T10:05:33Z"
    creationTimestamp: "2023-05-02T17:29:46Z"
    labels:
      app.kubernetes.io/instance: rke2-coredns
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-coredns
      helm.sh/chart: rke2-coredns-1.19.402
      k8s-app: kube-dns
      kubernetes.io/cluster-service: "true"
      kubernetes.io/name: CoreDNS
    name: rke2-coredns-rke2-coredns
    namespace: kube-system
    resourceVersion: "5534372"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-coredns-rke2-coredns-6b9548f79f-fg2th
        namespace: kube-system
        uid: REDACTED
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-coredns-rke2-coredns-6b9548f79f-n4p5l
        namespace: kube-system
        uid: REDACTED
    ports:
    - name: tcp-53
      port: 53
      protocol: TCP
    - name: udp-53
      port: 53
      protocol: UDP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-19T10:05:23Z"
    creationTimestamp: "2023-05-02T17:30:20Z"
    labels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: rke2-ingress-nginx
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-ingress-nginx
      app.kubernetes.io/part-of: rke2-ingress-nginx
      app.kubernetes.io/version: 1.6.4
      helm.sh/chart: rke2-ingress-nginx-4.5.201
    name: rke2-ingress-nginx-controller-admission
    namespace: kube-system
    resourceVersion: "5534140"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-ingress-nginx-controller-2h95m
        namespace: kube-system
        uid: REDACTED
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-ingress-nginx-controller-8hvtl
        namespace: kube-system
        uid: REDACTED
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-ingress-nginx-controller-c8x24
        namespace: kube-system
        uid: REDACTED
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-ingress-nginx-controller-df4lk
        namespace: kube-system
        uid: REDACTED
    ports:
    - appProtocol: https
      name: https-webhook
      port: 8443
      protocol: TCP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-02T17:30:09Z"
    creationTimestamp: "2023-05-02T17:30:09Z"
    labels:
      app: rke2-metrics-server
      app.kubernetes.io/managed-by: Helm
      chart: rke2-metrics-server-2.11.100-build2022101107
      heritage: Helm
      release: rke2-metrics-server
    name: rke2-metrics-server
    namespace: kube-system
    resourceVersion: "5533133"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-metrics-server-7d58bbc9c6-xvgg8
        namespace: kube-system
        uid: REDACTED
    ports:
    - name: metrics
      port: 10250
      protocol: TCP
    - name: https
      port: 10250
      protocol: TCP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-02T17:30:10Z"
    creationTimestamp: "2023-05-02T17:30:10Z"
    labels:
      app.kubernetes.io/instance: rke2-snapshot-validation-webhook
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-snapshot-validation-webhook
      app.kubernetes.io/version: v6.2.1
      helm.sh/chart: rke2-snapshot-validation-webhook-1.7.100
    name: rke2-snapshot-validation-webhook
    namespace: kube-system
    resourceVersion: "5533131"
    uid: REDACTED
  subsets:
  - addresses:
    - REDACTED
      nodeName: REDACTED
      targetRef:
        kind: Pod
        name: rke2-snapshot-validation-webhook-7748dbf6ff-xdtm2
        namespace: kube-system
        uid: REDACTED
    ports:
    - name: https
      port: 8443
      protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-coredns-rke2-coredns-6b9548f79f-fg2th
      namespace: kube-system
      uid: REDACTED
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-coredns-rke2-coredns-6b9548f79f-n4p5l
      namespace: kube-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-19T10:05:33Z"
    creationTimestamp: "2023-05-02T17:29:46Z"
    generateName: rke2-coredns-rke2-coredns-
    generation: 78
    labels:
      app.kubernetes.io/instance: rke2-coredns
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-coredns
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      helm.sh/chart: rke2-coredns-1.19.402
      k8s-app: kube-dns
      kubernetes.io/cluster-service: "true"
      kubernetes.io/name: CoreDNS
      kubernetes.io/service-name: rke2-coredns-rke2-coredns
    name: rke2-coredns-rke2-coredns-d7srf
    namespace: kube-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rke2-coredns-rke2-coredns
      uid: REDACTED
    resourceVersion: "5534370"
    uid: REDACTED
  ports:
  - name: tcp-53
    port: 53
    protocol: TCP
  - name: udp-53
    port: 53
    protocol: UDP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-ingress-nginx-controller-2h95m
      namespace: kube-system
      uid: REDACTED
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-ingress-nginx-controller-c8x24
      namespace: kube-system
      uid: REDACTED
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-ingress-nginx-controller-df4lk
      namespace: kube-system
      uid: REDACTED
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-ingress-nginx-controller-8hvtl
      namespace: kube-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-19T10:05:23Z"
    creationTimestamp: "2023-05-02T17:30:20Z"
    generateName: rke2-ingress-nginx-controller-admission-
    generation: 265
    labels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: rke2-ingress-nginx
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-ingress-nginx
      app.kubernetes.io/part-of: rke2-ingress-nginx
      app.kubernetes.io/version: 1.6.4
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      helm.sh/chart: rke2-ingress-nginx-4.5.201
      kubernetes.io/service-name: rke2-ingress-nginx-controller-admission
    name: rke2-ingress-nginx-controller-admission-g25cm
    namespace: kube-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rke2-ingress-nginx-controller-admission
      uid: REDACTED
    resourceVersion: "5534139"
    uid: REDACTED
  ports:
  - appProtocol: https
    name: https-webhook
    port: 8443
    protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-metrics-server-7d58bbc9c6-xvgg8
      namespace: kube-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-02T17:30:09Z"
    creationTimestamp: "2023-05-02T17:30:09Z"
    generateName: rke2-metrics-server-
    generation: 27
    labels:
      app: rke2-metrics-server
      app.kubernetes.io/managed-by: Helm
      chart: rke2-metrics-server-2.11.100-build2022101107
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      heritage: Helm
      kubernetes.io/service-name: rke2-metrics-server
      release: rke2-metrics-server
    name: rke2-metrics-server-wmz2b
    namespace: kube-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rke2-metrics-server
      uid: REDACTED
    resourceVersion: "5533128"
    uid: REDACTED
  ports:
  - name: metrics
    port: 10250
    protocol: TCP
  - name: https
    port: 10250
    protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - REDACTED
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: REDACTED
    targetRef:
      kind: Pod
      name: rke2-snapshot-validation-webhook-7748dbf6ff-xdtm2
      namespace: kube-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-02T17:30:10Z"
    creationTimestamp: "2023-05-02T17:30:10Z"
    generateName: rke2-snapshot-validation-webhook-
    generation: 16
    labels:
      app.kubernetes.io/instance: rke2-snapshot-validation-webhook
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: rke2-snapshot-validation-webhook
      app.kubernetes.io/version: v6.2.1
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      helm.sh/chart: rke2-snapshot-validation-webhook-1.7.100
      kubernetes.io/service-name: rke2-snapshot-validation-webhook
    name: rke2-snapshot-validation-webhook-mzc9v
    namespace: kube-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rke2-snapshot-validation-webhook
      uid: REDACTED
    resourceVersion: "5533125"
    uid: REDACTED
  ports:
  - name: https
    port: 8443
    protocol: TCP
kind: List
metadata:
  resourceVersion: 
mario-juarez commented 1 year ago

Hi, I am having the same problem reported in this issue, and I noticed this only happens when the service name is too large, and it was introduced in this change: https://github.com/kubernetes/ingress-nginx/pull/8890 when migrating to endpointslices.

This error didn't happen with endpoints because the name of an endpoint is always the same as de service, but, the endpointslices are truncated when the name is too long, and the controller is trying to get the endpointslices with the service name, which doesn't match.

Example:

# kubectl get endpoints -n my-awesome-service | grep sensorgroup    
my-awesome-service-telemetry-online-processor-dlc-sensorgroup     10.0.0.21:8080 
# kubectl get EndpointSlice -n my-awesome-service | grep sensorgr   
my-awesome-service-telemetry-online-processor-dlc-sensorgrn4mvj   IPv4          8080      10.0.0.21                                         35d

I think this issue is related and could be the fix https://github.com/kubernetes/ingress-nginx/issues/9908

longwuyuan commented 1 year ago

If its really about long names, then ;

rdb0101 commented 1 year ago

This would then indicate a fix has already been implemented? Also if it relates to the svc long name, why would this be happening to the "rancher" service .... which does not seem to be a long name ...

mario-juarez commented 1 year ago

If its really about long names, then ;

Looks like the fix for long service names was fixed in this release https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.5.1

Thanks @longwuyuan

rdb0101 commented 1 year ago

Thank you for the information, but if it was fixed then why are these issues still occurring? Do you have any idea why this is the case? Any feedback would be much appreciated!

rdb0101 commented 1 year ago

@longwuyuan is this issue due to long service names? Is that why the services are being reported to not have an active endpoint?

rdb0101 commented 1 year ago

Please see below the error logged for the rancher service, along with the endpointslice + prefix. Despite the 63 character limit when the prefix is added to the service name, the rancher endpointslice name is well under the 63 character limit ....

Service "cattle-system/rancher" does not have any active Endpoint. Endpointslice name (service + prefix) rancher-hkpgr

Are all services ignored due to the prefix being added to the endpointslice name? Or are the services being ignored for any endpointslice name that is over 63 characters?

Does anyone have any thoughts on this?

Forgot to mention that the services' endpoints/endpointslices are periodically recognized and function as expected. However, then randomly one service will throw a 404 error, resulting in the error "service does not have an active endpoint"; when the active endpoint exists.

longwuyuan commented 1 year ago

Hi,

The data posted in this issue does not look like something that a developer can use to reproduce the problem. Any help on reproducing the problem is welcome.

Any data that is a complete coverage of the bad state, like logs combined with the output of kubectl describe ..., of all related objects (controller, application ingress) components (pod, svc, ingrsss ep, epslices etc etc) , when this problem is actively in play, is also welcome.

rdb0101 commented 1 year ago

@longwuyuan please see the requested, the only logging for this issue that is found is "Service cattle-system/rancher does not have any active Endpoint"

# kubectl -n cattle-system get svc,ing,ep,endpointslice

NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/rancher           ClusterIP   REDACTED     <none>        80/TCP,443/TCP   25h
service/rancher-webhook   ClusterIP   REDACTED    <none>        443/TCP          24h
service/webhook-service   ClusterIP   REDACTED   <none>        443/TCP          24h

NAME                                CLASS    HOSTS                          ADDRESS                                                     PORTS     AGE
ingress.networking.k8s.io/rancher   <none>   REDACTED                       REDACTED                                                    80, 443   25h

NAME                        ENDPOINTS                                          AGE
endpoints/rancher           HOST1:80,HOST2:80,HOST3:80 + 3 more...             25h
endpoints/rancher-webhook   HOST1:9443                                         24h
endpoints/webhook-service   HOST1:8777                                         24h

NAME                                                   ADDRESSTYPE   PORTS    ENDPOINTS                          AGE
endpointslice.discovery.k8s.io/rancher-hkpgr           IPv4          80,444   HOST2,HOST3,HOST1                  25h
endpointslice.discovery.k8s.io/rancher-webhook-sfgns   IPv4          9443     HOST1                              24h
endpointslice.discovery.k8s.io/webhook-service-b4s92   IPv4          8777     HOST1                              24h

--- 
 # kubectl -n cattle-system get svc,ing,ep,endpointslice -o yaml

apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rancher
      meta.helm.sh/release-namespace: cattle-system
    creationTimestamp: "2023-05-22T15:11:46Z"
    labels:
      app: rancher
      app.kubernetes.io/managed-by: Helm
      chart: rancher-2.7.3
      heritage: Helm
      release: rancher
    name: rancher
    namespace: cattle-system
    resourceVersion: "5250"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 80
    - name: https-internal
      port: 443
      protocol: TCP
      targetPort: 444
    selector:
      app: rancher
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rancher-webhook
      meta.helm.sh/release-namespace: cattle-system
    creationTimestamp: "2023-05-22T15:17:43Z"
    labels:
      app.kubernetes.io/managed-by: Helm
    name: rancher-webhook
    namespace: cattle-system
    resourceVersion: "9776"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: 9443
    selector:
      app: rancher-webhook
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      meta.helm.sh/release-name: rancher-webhook
      meta.helm.sh/release-namespace: cattle-system
      need-a-cert.cattle.io/secret-name: rancher-webhook-tls
    creationTimestamp: "2023-05-22T15:17:43Z"
    labels:
      app.kubernetes.io/managed-by: Helm
    name: webhook-service
    namespace: cattle-system
    resourceVersion: "9772"
    uid: REDACTED
  spec:
    clusterIP: REDACTED
    clusterIPs:
    - REDACTED
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: 8777
    selector:
      app: rancher-webhook
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: networking.k8s.io/v1
  kind: Ingress
  metadata:
    annotations:
      field.cattle.io/publicEndpoints: '[{"addresses":["WORKER1","CONTROLPLANE","WORKER3","WORKER2"],"port":443,"protocol":"HTTPS","serviceName":"cattle-system:rancher","ingressName":"cattle-system:rancher","hostname":"CONTROLPLANE-HOSTNAME","allNodes":false}]'
      meta.helm.sh/release-name: rancher
      meta.helm.sh/release-namespace: cattle-system
      nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
      nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
      nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
    creationTimestamp: "2023-05-22T15:11:46Z"
    generation: 1
    labels:
      app: rancher
      app.kubernetes.io/managed-by: Helm
      chart: rancher-2.7.3
      heritage: Helm
      release: rancher
    name: rancher
    namespace: cattle-system
    resourceVersion: "301991"
    uid: REDACTED
  spec:
    rules:
    - host: CONTROLPLANE-HOSTNAME
      http:
        paths:
        - backend:
            service:
              name: rancher
              port:
                number: 80
          pathType: ImplementationSpecific
    tls:
    - hosts:
      - CONTROLPLANE-HOSTNAME
      secretName: tls-rancher-ingress
  status:
    loadBalancer:
      ingress:
      - ip: WORKER1
      - ip: CONTROLPLANE
      - ip: WORKER3
      - ip: WORKER2
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-23T10:10:55Z"
    creationTimestamp: "2023-05-22T15:11:46Z"
    labels:
      app: rancher
      app.kubernetes.io/managed-by: Helm
      chart: rancher-2.7.3
      heritage: Helm
      release: rancher
    name: rancher
    namespace: cattle-system
    resourceVersion: "301212"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: HOST1
      nodeName: CONTROLPLANE
      targetRef:
        kind: Pod
        name: rancher-6b4977f897-jrzjx
        namespace: cattle-system
        uid: REDACTED
    - ip: HOST2
      nodeName: WORKER2
      targetRef:
        kind: Pod
        name: rancher-6b4977f897-6sf47
        namespace: cattle-system
        uid: REDACTED
    - ip: HOST3
      nodeName: WORKER3
      targetRef:
        kind: Pod
        name: rancher-6b4977f897-xx8gf
        namespace: cattle-system
        uid: REDACTED
    ports:
    - name: http
      port: 80
      protocol: TCP
    - name: https-internal
      port: 444
      protocol: TCP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-23T10:10:06Z"
    creationTimestamp: "2023-05-22T15:17:44Z"
    labels:
      app.kubernetes.io/managed-by: Helm
    name: rancher-webhook
    namespace: cattle-system
    resourceVersion: "300547"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: HOST1
      nodeName: WORKER3
      targetRef:
        kind: Pod
        name: rancher-webhook-656cd8b9f-cbjbw
        namespace: cattle-system
        uid: REDACTED
    ports:
    - name: https
      port: 9443
      protocol: TCP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-23T10:10:06Z"
    creationTimestamp: "2023-05-22T15:17:44Z"
    labels:
      app.kubernetes.io/managed-by: Helm
    name: webhook-service
    namespace: cattle-system
    resourceVersion: "300546"
    uid: REDACTED
  subsets:
  - addresses:
    - ip: HOST1
      nodeName: WORKER3
      targetRef:
        kind: Pod
        name: rancher-webhook-656cd8b9f-cbjbw
        namespace: cattle-system
        uid: REDACTED
    ports:
    - name: https
      port: 8777
      protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - HOST2
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: WORKER2
    targetRef:
      kind: Pod
      name: rancher-6b4977f897-6sf47
      namespace: cattle-system
      uid: REDACTED
  - addresses:
    - HOST3
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: WORKER3
    targetRef:
      kind: Pod
      name: rancher-6b4977f897-xx8gf
      namespace: cattle-system
      uid: REDACTED
  - addresses:
    - HOST1
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: CONTROLPLANE
    targetRef:
      kind: Pod
      name: rancher-6b4977f897-jrzjx
      namespace: cattle-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-23T10:10:55Z"
    creationTimestamp: "2023-05-22T15:11:46Z"
    generateName: rancher-
    generation: 20
    labels:
      app: rancher
      app.kubernetes.io/managed-by: Helm
      chart: rancher-2.7.3
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      heritage: Helm
      kubernetes.io/service-name: rancher
      release: rancher
    name: rancher-hkpgr
    namespace: cattle-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rancher
      uid: REDACTED
    resourceVersion: "301213"
    uid: REDACTED
  ports:
  - name: http
    port: 80
    protocol: TCP
  - name: https-internal
    port: 444
    protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - HOST1
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: WORKER3
    targetRef:
      kind: Pod
      name: rancher-webhook-656cd8b9f-cbjbw
      namespace: cattle-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    creationTimestamp: "2023-05-22T15:17:44Z"
    generateName: rancher-webhook-
    generation: 6
    labels:
      app.kubernetes.io/managed-by: Helm
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      kubernetes.io/service-name: rancher-webhook
    name: rancher-webhook-sfgns
    namespace: cattle-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: rancher-webhook
      uid: REDACTED
    resourceVersion: "300903"
    uid: REDACTED
  ports:
  - name: https
    port: 9443
    protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - HOST1
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: WORKER3
    targetRef:
      kind: Pod
      name: rancher-webhook-656cd8b9f-cbjbw
      namespace: cattle-system
      uid: REDACTED
  kind: EndpointSlice
  metadata:
    creationTimestamp: "2023-05-22T15:17:44Z"
    generateName: webhook-service-
    generation: 6
    labels:
      app.kubernetes.io/managed-by: Helm
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      kubernetes.io/service-name: webhook-service
    name: webhook-service-b4s92
    namespace: cattle-system
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: webhook-service
      uid: REDACTED
    resourceVersion: "300904"
    uid: REDACTED
  ports:
  - name: https
    port: 8777
    protocol: TCP
kind: List
metadata:
  resourceVersion: ""
rdb0101 commented 1 year ago

When it errors out with the 404 page not found the following is logged in the "rke2-ingress-nginx-controller-" logs:

I0523 10:03:33.790084       7 store.go:433] "Found valid IngressClass" ingress="cattle-system/rancher" ingressclass="_"
W0523 10:04:21.542696       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:24.876046       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:32.725971       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:36.060111       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:39.393421       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:42.726233       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:46.059925       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:49.392749       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:52.726522       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:56.059866       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:04:59.393704       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:05:02.726128       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint.
W0523 10:05:06.060042       7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint

It logs the same error above for each service that periodically times out.

longwuyuan commented 1 year ago

@rdb0101 your latest post above is one example of not having data to analyse or reproduce.

To be precise, if someone can post the logs of controllerpod and also the output of kubectl get endpointslices -n cattle-system, while the problem is live, then the timestamp on log message and the output of kubectl can be co-related. Other info that will provide info there is kubectl -n cattle-system get events.

If you post kubectl get po -n cattle-system , you can see the restarts, if any.

If you see logs of rancher pod, you could see rancher events and check if any are related.

In any case I don't think any developer can reproduce this problem, with the information currently posted in this issue.

rdb0101 commented 1 year ago

@longwuyuan Thank you for clarifying what data is needed in order to provide a reproducable problem. Please see below the errors that show what happens when the rancher service goes from having no active endpoint, to restart the ingress: controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint. W0523 10:05:19.393530 7 controller.go:1163] Service "cattle-system/rancher" does not have any active Endpoint. I0523 10:10:08.404944 7 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"REDACTED", APIVersion:"networking.k8s.io/v1", ResourceVersion:"300576", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync I0523 10:12:09.170841 7 status.go:300] "updating Ingress status" namespace="cattle-system" ingress="rancher" currentValue=[{IP: CONTROLPLANE Hostname: Ports:[]} {IP:WORKER3 Hostname: Ports:[]} {IP:WORKER2 Hostname: Ports:[]}] newValue=[{IP:WORKER1 Hostname: Ports:[]} {IP: CONTROLPLANE Hostname: Ports:[]} {IP:WORKER3 Hostname: Ports:[]} {IP:WORKER2 Hostname: Ports:[]}]

Please note that this problem is reproducable by setting up rke2 with helm install of rancher 2.7.3 This exact issue occurs even in a minimum install.

longwuyuan commented 1 year ago

@rdb0101 I am sorry you are having this issue and I hope it resolves sooner. Here are my thoughts and I hope you see the practical side of an issue being created here in this project.

rdb0101 commented 1 year ago

Hi @longwuyuan thanks very much for your feedback. I used rancher just as an example. However, it is not specific to just rancher, this issue impacts all of the services I have deployed. I used rancher as an example; as the service + prefix for the endpointslicename is under the 63 character limit. I was trying to determine as to how or whether the nginx-controller was filtering out even the rancher service name, despite being well under the limit. I apologize again if my feedback was unclear. If this issue was specific to just rancher then it would likely only impact the rancher service correct?

longwuyuan commented 1 year ago

Correct.

I am using v1.7.1 of the controller with TLS and I don't face tis problem. Can you try to reproduce the problem in minikube using image nginx:alpine for creating deployment and exposing using ingress-nginx controller and metalllb.

rdb0101 commented 1 year ago

@longwuyuan Thanks very much for the feedback. I will go ahead and stand up minikube with the version and image as recommended. I will provide the output once I have reproduced the issue.

rdb0101 commented 1 year ago

@longwuyuan Is your current environment multi-node as well?

longwuyuan commented 1 year ago

no

rdb0101 commented 1 year ago

okay thank you for verifying

ksingh7 commented 1 year ago

@rdb0101 @longwuyuan i can confirm that i have the exact same issue , read this thread thoroughly and seems like community needs help in capture live logs , which i am facing currently. Here are they

Kubernetes Cluster : Digital Ocean Managed , version v1.26.3

NGINX Ingress controller
  Release:       v1.7.1
  Build:         f48b03be54031491e78472bcf3aa026a81e1ffd3
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

ingress-nginx Chart Version 4.6.1
$ kubectl -n kratos-staging get svc,ing,ep,endpointslice
NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/kratos-service   ClusterIP   10.245.73.194   <none>        4433/TCP,4434/TCP   62m

NAME                                    CLASS    HOSTS                ADDRESS           PORTS   AGE
ingress.networking.k8s.io/api-ingress   <none>   accounts.example.in   1xx.13x.122.209   80      49m

NAME                       ENDPOINTS   AGE
endpoints/kratos-service   <none>      62m

NAME                                                  ADDRESSTYPE   PORTS     ENDPOINTS   AGE
endpointslice.discovery.k8s.io/kratos-service-jkb94   IPv4          <unset>   <unset>     62m
$ kubectl -n kratos-staging get svc,ing,ep,endpointslice -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2023-05-29T17:50:03Z"
    labels:
      app: kratos
    name: kratos-service
    namespace: kratos-staging
    resourceVersion: "107055"
    uid: ad9f6739-f132-4678-8a56-0d4ca3f679ff
  spec:
    clusterIP: 10.245.73.194
    clusterIPs:
    - 10.245.73.194
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: http-public
      port: 4433
      protocol: TCP
      targetPort: 4433
    - name: http-admin
      port: 4434
      protocol: TCP
      targetPort: 4434
    selector:
      app: kratos
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
- apiVersion: networking.k8s.io/v1
  kind: Ingress
  metadata:
    annotations:
      kubernetes.io/ingress.class: nginx
      nginx.ingress.kubernetes.io/enable-cors: "true"
      nginx.ingress.kubernetes.io/rewrite-target: /$2
      nginx.ingress.kubernetes.io/ssl-redirect: "false"
    creationTimestamp: "2023-05-29T18:02:32Z"
    generation: 1
    name: api-ingress
    namespace: kratos-staging
    resourceVersion: "110037"
    uid: 18832e6c-fc36-4440-a482-6217078d2c6a
  spec:
    rules:
    - host: accounts.example.in
      http:
        paths:
        - backend:
            service:
              name: kratos-service
              port:
                number: 4433
          path: /kratos(/|$)(.*)
          pathType: Prefix
  status:
    loadBalancer:
      ingress:
      - ip: 1xx.1xx.122.209
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-29T17:50:03Z"
    creationTimestamp: "2023-05-29T17:50:03Z"
    labels:
      app: kratos
    name: kratos-service
    namespace: kratos-staging
    resourceVersion: "107056"
    uid: a53b4461-b759-44e5-af3e-7a9322056eac
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints: null
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2023-05-29T17:50:03Z"
    creationTimestamp: "2023-05-29T17:50:03Z"
    generateName: kratos-service-
    generation: 1
    labels:
      app: kratos
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      kubernetes.io/service-name: kratos-service
    name: kratos-service-jkb94
    namespace: kratos-staging
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: kratos-service
      uid: ad9f6739-f132-4678-8a56-0d4ca3f679ff
    resourceVersion: "107057"
    uid: d2094f71-d8ea-4e20-a225-25f035f64b6b
  ports: null
kind: List
metadata:
  resourceVersion: ""

$  kubectl logs nginx-ingress-ingress-nginx-controller-fd49fcc58-zjb2j -n default -f
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.7.1
  Build:         f48b03be54031491e78472bcf3aa026a81e1ffd3
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

W0529 18:06:15.955460       8 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0529 18:06:15.955660       8 main.go:209] "Creating API client" host="https://10.245.0.1:443"
I0529 18:06:15.970169       8 main.go:253] "Running in Kubernetes cluster" major="1" minor="26" git="v1.26.3" state="clean" commit="9e644106593f3f4aa98f8a84b23db5fa378900bd" platform="linux/amd64"
I0529 18:06:16.113949       8 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I0529 18:06:16.142720       8 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I0529 18:06:16.157955       8 nginx.go:261] "Starting NGINX Ingress controller"
I0529 18:06:16.167843       8 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"default", Name:"nginx-ingress-ingress-nginx-controller", UID:"8c5564f0-82f6-4481-84f1-96deae0cf56c", APIVersion:"v1", ResourceVersion:"105147", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap default/nginx-ingress-ingress-nginx-controller
I0529 18:06:17.263997       8 store.go:433] "Found valid IngressClass" ingress="kratos-staging/api-ingress" ingressclass="nginx"
I0529 18:06:17.264673       8 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"kratos-staging", Name:"api-ingress", UID:"18832e6c-fc36-4440-a482-6217078d2c6a", APIVersion:"networking.k8s.io/v1", ResourceVersion:"110037", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0529 18:06:17.360198       8 nginx.go:304] "Starting NGINX process"
I0529 18:06:17.360289       8 leaderelection.go:248] attempting to acquire leader lease default/nginx-ingress-ingress-nginx-leader...
I0529 18:06:17.360770       8 nginx.go:324] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
W0529 18:06:17.361023       8 controller.go:1152] Service "kratos-staging/kratos-service" does not have any active Endpoint.
I0529 18:06:17.361349       8 controller.go:190] "Configuration changes detected, backend reload required"
I0529 18:06:17.372893       8 status.go:84] "New leader elected" identity="nginx-ingress-ingress-nginx-controller-fd49fcc58-qt6tt"
I0529 18:06:17.471508       8 controller.go:207] "Backend successfully reloaded"
I0529 18:06:17.471812       8 controller.go:218] "Initial sync, sleeping for 1 second"
I0529 18:06:17.471937       8 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"nginx-ingress-ingress-nginx-controller-fd49fcc58-zjb2j", UID:"55dbe193-a944-4cdd-acf5-07ffe64fac06", APIVersion:"v1", ResourceVersion:"110912", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
W0529 18:06:21.171598       8 controller.go:1152] Service "kratos-staging/kratos-service" does not have any active Endpoint.
W0529 18:06:25.542628       8 controller.go:1152] Service "kratos-staging/kratos-service" does not have any active Endpoint.
W0529 18:06:28.876887       8 controller.go:1152] Service "kratos-staging/kratos-service" does not have any active Endpoint.
W0529 18:06:32.209528       8 controller.go:1152] Service "kratos-staging/kratos-service" does not have any active Endpoint.
I0529 18:06:59.957867       8 status.go:84] "New leader elected" identity="nginx-ingress-ingress-nginx-controller-fd49fcc58-zjb2j"
I0529 18:06:59.957834       8 leaderelection.go:258] successfully acquired lease default/nginx-ingress-ingress-nginx-leader
10.244.0.50 - - [29/May/2023:18:07:49 +0000] "GET /ui/welcome HTTP/2.0" 404 146 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0" 287 0.001 [upstream-default-backend] [] 127.0.0.1:8181 146 0.001 404 20a53ca37e3a44a9942e79b5f7d09594
10.244.0.50 - - [29/May/2023:18:07:59 +0000] "GET / HTTP/2.0" 404 146 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0" 16 0.001 [upstream-default-backend] [] 127.0.0.1:8181 146 0.001 404 ef64dc8200a573ead9dd21852aca603b
10.244.0.50 - - [29/May/2023:18:08:40 +0000] "GET / HTTP/1.1" 404 146 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0" 356 0.001 [upstream-default-backend] [] 127.0.0.1:8181 146 0.000 404 d17b9a4226f85900de13a36efe5d3ba3
10.244.0.50 - - [29/May/2023:18:08:40 +0000] "GET /favicon.ico HTTP/1.1" 404 146 "http://accounts.example.in/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0" 314 0.001 [upstream-default-backend] [] 127.0.0.1:8181 146 0.000 404 ac6dd100945f51f6a71bcf531ed6809f
10.244.0.50 - - [29/May/2023:18:08:47 +0000] "GET /ui/welcome HTTP/2.0" 404 146 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0" 23 0.001 [upstream-default-backend] [] 127.0.0.1:8181 146 0.000 404 44a30be3e49357c7294fa9866aea1c98
10.244.0.50 - - [29/May/2023:18:09:50 +0000] "GET /ui/welcome HTTP/2.0" 404 146 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0" 23 0.000 [upstream-default-backend] [] 127.0.0.1:8181 146 0.001 404 f85294bcfd5a8d0c447d21aae0576c26
10.244.0.50 - - [29/May/2023:18:10:56 +0000] "m\xEB\xC7~0\xC1\xB3\xACtQ\xB6\xE0q\x9E\x19\xBA" 400 150 "-" "-" 0 0.010 [] [] - - - - 28285b6bd2b7568e6a538a1e496a1f5d
2023/05/29 18:11:00 [crit] 27#27: *2471 SSL_do_handshake() failed (SSL: error:0A00006C:SSL routines::bad key share) while SSL handshaking, client: 10.244.0.50, server: 0.0.0.0:443
10.244.0.50 - - [29/May/2023:18:11:01 +0000] "GET / HTTP/1.1" 400 650 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 206 0.000 [] [] - - - - ddfafb94bdf0ebc378187757913e58df
10.244.0.50 - - [29/May/2023:18:11:01 +0000] "GET /private/api/v1/service/premaster HTTP/1.1" 400 650 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 238 0.000 [] [] - - - - 3969bf23609b416674790b5922f9c070
10.244.0.50 - - [29/May/2023:18:22:05 +0000] "\x03\x00\x00/*\xE0\x00\x00\x00\x00\x00Cookie: mstshash=Administr" 400 150 "-" "-" 0 0.166 [] [] - - - - 02c371c8d08e63424878349b644715e7
10.244.0.50 - - [29/May/2023:18:25:41 +0000] "CONNECT checkip.amazonaws.com:443 HTTP/1.1" 400 150 "-" "-" 0 4.530 [] [] - - - - 4c21eccbb53aad215f16d7e79e2a8576
10.244.0.50 - - [29/May/2023:18:25:42 +0000] "\x04\x01\x00P\x22\xFF\xAD\xC20\x00" 400 150 "-" "-" 0 0.426 [] [] - - - - 52d309a9de55421b0363061d0ee36a0d
10.244.0.50 - - [29/May/2023:18:25:52 +0000] "\x05\x01\x00" 400 150 "-" "-" 0 0.407 [] [] - - - - 19b8631ce2a32f3cbe233241e73c88c4
10.244.0.50 - - [29/May/2023:18:30:52 +0000] "GET /ui/welcome HTTP/2.0" 404 146 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0" 287 0.001 [upstream-default-backend] [] 127.0.0.1:8181 146 0.001 404 c77db22e17b9876b2243bbf21ad10eb6
10.244.0.50 - - [29/May/2023:18:30:52 +0000] "GET /favicon.ico HTTP/2.0" 499 0 "https://accounts.example.in/ui/welcome" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0" 94 0.000 [upstream-default-backend] [] 127.0.0.1:8181 0 0.000 - 526324dfe16207dc99b228c930b641db
10.244.0.50 - - [29/May/2023:18:49:36 +0000] "CONNECT www.yahoo.com:443 HTTP/1.1" 400 150 "-" "-" 0 0.146 [] [] - - - - 9d18b470b8515b04e59649ff66790b22
10.244.0.50 - - [29/May/2023:19:05:19 +0000] "\x16\x03\x00\x00i\x01\x00\x00e\x03\x03U\x1C\xA7\xE4random1random2random3random4\x00\x00\x0C\x00/\x00" 400 150 "-" "-" 0 0.153 [] [] - - - - 6717ee7d921b5b078441421c7d18173f
ksingh7 commented 1 year ago

@longwuyuan do you think i should try downgrading the helm chart version ? if yes which version should i try ? any suggestion

longwuyuan commented 1 year ago
ksingh7 commented 1 year ago

@longwuyuan this is a blocker issue for us hence trying to provide you more information as you requested

 SSL_do_handshake() failed (SSL: error:0A00006C:SSL routines::bad key share) while SSL handshaking
$ kubectl -n kratos-staging describe svc kratos-service
Name:              kratos-service
Namespace:         kratos-staging
Labels:            app=kratos
Annotations:       <none>
Selector:          app=kratos
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.245.73.194
IPs:               10.245.73.194
Port:              http-public  4433/TCP
TargetPort:        4433/TCP
Endpoints:         10.244.0.98:4433
Port:              http-admin  4434/TCP
TargetPort:        4434/TCP
Endpoints:         10.244.0.98:4434
Session Affinity:  None
Events:            <none>

$ kubectl -n kratos-staging get po -o wide
NAME                                                     READY   STATUS    RESTARTS   AGE     IP             NODE                   NOMINATED NODE   READINESS GATES
kratos-5bd7697c6b-w4srk                                  1/1     Running   0          10h     10.244.0.98    pool-h4t86o44q-fen7g   <none>           <none>
nginx-ingress-ingress-nginx-controller-5c6c7cfb8-7qrm4   1/1     Running   0          13m     10.244.0.32    pool-h4t86o44q-fen7g   <none>           <none>
$

$ kubectl -n kratos-staging describe po

Name:             kratos-5bd7697c6b-w4srk
Namespace:        kratos-staging
Priority:         0
Service Account:  default
Node:             pool-h4t86o44q-fen7g/10.122.0.2
Start Time:       Tue, 30 May 2023 01:12:10 +0530
Labels:           app=kratos
                  pod-template-hash=5bd7697c6b
Annotations:      <none>
Status:           Running
IP:               10.244.0.98
IPs:
  IP:           10.244.0.98
Controlled By:  ReplicaSet/kratos-5bd7697c6b
Containers:
  kratos:
    Container ID:  containerd://f008afa67b5c9952229c61e0812e2439c5a6ec215f5910619493c3936a2de6f6
    Image:         oryd/kratos
    Image ID:      docker.io/oryd/kratos@sha256:5ec9808accebd4826b15b21bc6bcaa4410d3dc451ebe2bf8da812042df046ceb
    Ports:         4433/TCP, 4434/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      kratos
      -c
      /etc/config/kratos/kratos.yml
      serve
    State:          Running
      Started:      Tue, 30 May 2023 01:12:11 +0530
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:     500m
      memory:  128Mi
    Environment Variables from:
      kratos-env  Secret  Optional: false
    Environment:  <none>
    Mounts:
      /etc/config/identity.schema.json from kratos-identity-schema (rw,path="identity.schema.json")
      /etc/config/kratos/kratos.yml from kratos-config (rw,path="kratos.yml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bk8jr (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kratos-identity-schema:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      identity-schema-config
    Optional:  false
  kratos-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kratos-config
    Optional:  false
  kube-api-access-bk8jr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

Name:             nginx-ingress-ingress-nginx-controller-5c6c7cfb8-7qrm4
Namespace:        kratos-staging
Priority:         0
Service Account:  nginx-ingress-ingress-nginx
Node:             pool-h4t86o44q-fen7g/10.122.0.2
Start Time:       Tue, 30 May 2023 11:30:26 +0530
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=nginx-ingress
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.7.1
                  helm.sh/chart=ingress-nginx-4.6.1
                  pod-template-hash=5c6c7cfb8
Annotations:      <none>
Status:           Running
IP:               10.244.0.32
IPs:
  IP:           10.244.0.32
Controlled By:  ReplicaSet/nginx-ingress-ingress-nginx-controller-5c6c7cfb8
Containers:
  controller:
    Container ID:  containerd://1b5bf2e1ffc5582cfc4e40f53401b7868cc081a3e8fdeb062b0cd3a92d51920b
    Image:         registry.k8s.io/ingress-nginx/controller:v1.7.1@sha256:7244b95ea47bddcb8267c1e625fb163fc183ef55448855e3ac52a7b260a60407
    Image ID:      registry.k8s.io/ingress-nginx/controller@sha256:7244b95ea47bddcb8267c1e625fb163fc183ef55448855e3ac52a7b260a60407
    Ports:         80/TCP, 443/TCP, 8443/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --publish-service=$(POD_NAMESPACE)/nginx-ingress-ingress-nginx-controller
      --election-id=nginx-ingress-ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/nginx-ingress-ingress-nginx-controller
      --watch-namespace=kratos-staging
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
    State:          Running
      Started:      Tue, 30 May 2023 11:30:27 +0530
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       nginx-ingress-ingress-nginx-controller-5c6c7cfb8-7qrm4 (v1:metadata.name)
      POD_NAMESPACE:  kratos-staging (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gmgbk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nginx-ingress-ingress-nginx-admission
    Optional:    false
  kube-api-access-gmgbk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age                  From                      Message
  ----    ------     ----                 ----                      -------
  Normal  Scheduled  14m                  default-scheduler         Successfully assigned kratos-staging/nginx-ingress-ingress-nginx-controller-5c6c7cfb8-7qrm4 to pool-h4t86o44q-fen7g
  Normal  Pulled     14m                  kubelet                   Container image "registry.k8s.io/ingress-nginx/controller:v1.7.1@sha256:7244b95ea47bddcb8267c1e625fb163fc183ef55448855e3ac52a7b260a60407" already present on machine
  Normal  Created    14m                  kubelet                   Created container controller
  Normal  Started    14m                  kubelet                   Started container controller
  Normal  RELOAD     4m53s (x2 over 14m)  nginx-ingress-controller  NGINX reload triggered due to a change in configuration
$
rdb0101 commented 1 year ago

@ksingh7 thank you for trying to work towards finding out the cause of this issue. However, I agree with @longwuyuan that the issue you have presented is related to invalid tls certificate configuration. The error could be from the fact it does not have a valid tls certificate; although I am not entirely sure; you need to check what secret your service using and what secret the nginx ingress-controller is using.

matthewbrumpton commented 1 year ago

I have been experiencing this issue for the last couple of weeks on muliple clusters, the environment does not have TLS configured, nginx install using helm

AKS 1.26.3 Ngnix 4.6.1

kubectl get svc,ing,ep,endpointslice -n bmdev-ne-linker

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/bmdev-ne-linker-service ClusterIP 172.16.252.129 80/TCP 85m

NAME CLASS HOSTS ADDRESS PORTS AGE ingress.networking.k8s.io/bmdev-ne-linker-ingress nginx linker-dev.***.com 10.1.0.5 80 85m

NAME ENDPOINTS AGE endpoints/bmdev-ne-linker-service 10.2.0.57:80 85m

NAME ADDRESSTYPE PORTS ENDPOINTS AGE endpointslice.discovery.k8s.io/bmdev-ne-linker-service-xmtlb IPv4 80 10.2.0.57 85m

kubectl get svc,ing,ep,endpointslice -n bmdev-ne-linker -o yaml

apiVersion: v1 items:


W0601 08:40:02.319786 7 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0601 08:40:02.319902 7 main.go:209] "Creating API client" host="https://172.16.0.1:443" I0601 08:40:02.344452 7 main.go:253] "Running in Kubernetes cluster" major="1" minor="26" git="v1.26.3" state="clean" commit="9e644106593f3f4aa98f8a84b23db5fa378900bd" platform="linux/amd64" I0601 08:40:02.571681 7 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem" I0601 08:40:02.590743 7 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key" I0601 08:40:02.599475 7 nginx.go:261] "Starting NGINX Ingress controller" I0601 08:40:02.607580 7 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"bmdev-ne-nginx", Name:"nginx-ingress-ingress-nginx-controller", UID:"d3046098-f6e9-4864-a1cd-544f102e6c96", APIVersion:"v1", ResourceVersion:"13283", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap bmdev-ne-nginx/nginx-ingress-ingress-nginx-controller I0601 08:40:03.801497 7 nginx.go:304] "Starting NGINX process" I0601 08:40:03.801587 7 leaderelection.go:248] attempting to acquire leader lease bmdev-ne-nginx/nginx-ingress-ingress-nginx-leader... I0601 08:40:03.801800 7 nginx.go:324] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key" I0601 08:40:03.801915 7 controller.go:190] "Configuration changes detected, backend reload required" I0601 08:40:03.815509 7 leaderelection.go:258] successfully acquired lease bmdev-ne-nginx/nginx-ingress-ingress-nginx-leader I0601 08:40:03.815654 7 status.go:84] "New leader elected" identity="nginx-ingress-ingress-nginx-controller-6679b95c85-2zb6l" I0601 08:40:03.842223 7 controller.go:207] "Backend successfully reloaded" I0601 08:40:03.842397 7 controller.go:218] "Initial sync, sleeping for 1 second" I0601 08:40:03.842459 7 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"bmdev-ne-nginx", Name:"nginx-ingress-ingress-nginx-controller-6679b95c85-2zb6l", UID:"8aa24f23-825a-4c36-ac01-664c471cdec9", APIVersion:"v1", ResourceVersion:"13317", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration W0601 08:47:34.889217 7 controller.go:1152] Service "bmdev-ne-linker/bmdev-ne-linker-service" does not have any active Endpoint. I0601 08:47:34.911404 7 admission.go:149] processed ingress via admission controller {testedIngressLength:1 testedIngressTime:0.022s renderingIngressLength:1 renderingIngressTime:0s admissionTime:18.4kBs testedConfigurationSize:0.023} I0601 08:47:34.911428 7 main.go:100] "successfully validated configuration, accepting" ingress="bmdev-ne-linker/bmdev-ne-linker-ingress" I0601 08:47:34.915890 7 store.go:433] "Found valid IngressClass" ingress="bmdev-ne-linker/bmdev-ne-linker-ingress" ingressclass="nginx" I0601 08:47:34.916047 7 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"bmdev-ne-linker", Name:"bmdev-ne-linker-ingress", UID:"2dda1d8a-5d2c-4fc4-a29c-1d7c6484fc75", APIVersion:"networking.k8s.io/v1", ResourceVersion:"16531", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync W0601 08:47:37.969723 7 controller.go:1152] Service "bmdev-ne-linker/bmdev-ne-linker-service" does not have any active Endpoint. I0601 08:47:37.969820 7 controller.go:190] "Configuration changes detected, backend reload required" I0601 08:47:38.013711 7 controller.go:207] "Backend successfully reloaded" I0601 08:47:38.013900 7 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"bmdev-ne-nginx", Name:"nginx-ingress-ingress-nginx-controller-6679b95c85-2zb6l", UID:"8aa24f23-825a-4c36-ac01-664c471cdec9", APIVersion:"v1", ResourceVersion:"13317", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration I0601 08:48:03.824349 7 status.go:300] "updating Ingress status" namespace="bmdev-ne-linker" ingress="bmdev-ne-linker-ingress" currentValue=[] newValue=[{IP:10.1.0.5 Hostname: Ports:[]}] I0601 08:48:03.829844 7 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"bmdev-ne-linker", Name:"bmdev-ne-linker-ingress", UID:"2dda1d8a-5d2c-4fc4-a29c-1d7c6484fc75", APIVersion:"networking.k8s.io/v1", ResourceVersion:"16769", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync I0601 10:02:01.269063 7 admission.go:149] processed ingress via admission controller {testedIngressLength:1 testedIngressTime:0.023s renderingIngressLength:1 renderingIngressTime:0s admissionTime:18.4kBs testedConfigurationSize:0.023} I0601 10:02:01.269089 7 main.go:100] "successfully validated configuration, accepting" ingress="bmdev-ne-linker/bmdev-ne-linker-ingress" I0601 10:02:01.274422 7 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"bmdev-ne-linker", Name:"bmdev-ne-linker-ingress", UID:"2dda1d8a-5d2c-4fc4-a29c-1d7c6484fc75", APIVersion:"networking.k8s.io/v1", ResourceVersion:"47332", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync I0601 10:02:01.274632 7 controller.go:190] "Configuration changes detected, backend reload required" I0601 10:02:01.318014 7 controller.go:207] "Backend successfully reloaded" I0601 10:02:01.318192 7 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"bmdev-ne-nginx", Name:"nginx-ingress-ingress-nginx-controller-6679b95c85-2zb6l", UID:"8aa24f23-825a-4c36-ac01-664c471cdec9", APIVersion:"v1", ResourceVersion:"13317", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration W0601 10:05:49.745387 7 controller.go:1152] Service "bmdev-ne-linker/bmdev-ne-linker-service" does not have any active Endpoint. W0601 10:05:53.079353 7 controller.go:1152] Service "bmdev-ne-linker/bmdev-ne-linker-service" does not have any active Endpoint. W0601 10:05:56.413340 7 controller.go:1152] Service "bmdev-ne-linker/bmdev-ne-linker-service" does not have any active Endpoint.

longwuyuan commented 1 year ago

@matthewbrumpton From your data, its helpful to know that the controller logged this error message

10:05:49.745387 7 controller.go:1152] Service "bmdev-ne-linker/bmdev-ne-linker-service" does not have any active Endpoint.

but the tracking of the problem ends there because the output of kubectl describe ing bmdev-ne-linker-ingress was required to be taken at the timestamp of the error message, and posted here for co-relating error message to object state. The endpoint ipaddresses get displayed as part of the ingress describe output.

Also, I am not sure if routing breaks at the time this error message is logged in the controller pod. Do the HTTP/HTTPS request to the URL for which that ingress's rules match work when this error message is logged (response code 200) ?

matthewbrumpton commented 1 year ago

@longwuyuan Error message logs in controller when the bmdev-ne-linker pod starts up, not on http request

W0601 10:54:49.605701 7 controller.go:1152] Service "bmdev-ne-linker/bmdev-ne-linker-service" does not have any active Endpoint. W0601 10:54:52.939783 7 controller.go:1152] Service "bmdev-ne-linker/bmdev-ne-linker-service" does not have any active Endpoint.

longwuyuan commented 1 year ago

@matthewbrumpton its not known if the error message is repeated forever after reconciling and if the HTTP/HTTPS request are failing forever.

So is this issue about a error message during startup and before reconciling ?

matthewbrumpton commented 1 year ago

@longwuyuan , error occurs during startup and before reconciling, unable to get any further as our Azure Frontdoor cannot reach the endpoint

longwuyuan commented 1 year ago

@matthewbrumpton so from the point of view of ingress-nginx controller, do you mean to say that if your HTTP/HTTPS request fails ? If yes, then very obviously, you need to show the data like ;

If you see above, repeated requests have been made to help analyse the state at the time the problem occurs, but for whatever reason there is no way to reproduce the problem on minikube or kind cluster and there is no data like kubectl describe output of ingress, when the problem happened

matthewbrumpton commented 1 year ago

@longwuyuan , recreated on minikube

helm install nginx-ingress ingress-nginx/ingress-nginx --version 4.6.1 --create-namespace --namespace bmde-ne-nginx --set controller.replicaCount=1 --set controller.metrics.enabled=true --set controller.nodeSelector."kubernetes.io/os"=linux --set controller.admissionWebhooks.patch.nodeSelector."kubernetes\.io/os"=linux --set controller.service.annotations."service.beta.kubernetes.io/azure-load-balancer-internal"=true

apiVersion: apps/v1 kind: Deployment metadata: name: aks-helloworld-one
namespace: httpbin spec: replicas: 1 selector: matchLabels: app: aks-helloworld-one template: metadata: labels: app: aks-helloworld-one spec: containers:

rdb0101 commented 1 year ago

hi @matthewbrumpton ty for recreating it in minikube -- did you experience the same error even using minikube?

longwuyuan commented 1 year ago

Why using azure annotation on minikube

On Thu, 1 Jun, 2023, 8:51 pm rdb0101, @.***> wrote:

hi @matthewbrumpton https://github.com/matthewbrumpton ty for recreating it in minikube -- did you experience the same error even using minikube?

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-nginx/issues/9932#issuecomment-1572257197, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWRR4MULYZ6HI54JM7LXJCXN3ANCNFSM6AAAAAAXYHZWOU . You are receiving this because you were mentioned.Message ID: @.***>

matthewbrumpton commented 1 year ago

@longwuyuan , same error with a deployment I use for testing

W0601 14:30:38.998226 7 controller.go:1152] Service "httpbin/httpbin" does not have any active Endpoint.


NGINX Ingress controller Release: v1.7.1 Build: f48b03be54031491e78472bcf3aa026a81e1ffd3 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.21.6


W0601 14:28:47.702631 7 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0601 14:28:47.702837 7 main.go:209] "Creating API client" host="https://10.96.0.1:443" I0601 14:28:47.707842 7 main.go:253] "Running in Kubernetes cluster" major="1" minor="26" git="v1.26.3" state="clean" commit="9e644106593f3f4aa98f8a84b23db5fa378900bd" platform="linux/amd64" I0601 14:28:47.842403 7 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem" I0601 14:28:47.867948 7 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key" I0601 14:28:47.876005 7 nginx.go:261] "Starting NGINX Ingress controller" I0601 14:28:47.883008 7 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"bmde-ne-nginx", Name:"nginx-ingress-ingress-nginx-controller", UID:"dca48c46-1ed7-431c-a283-cc4d9a8b5723", APIVersion:"v1", ResourceVersion:"1327", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap bmde-ne-nginx/nginx-ingress-ingress-nginx-controller I0601 14:28:49.078015 7 nginx.go:304] "Starting NGINX process" I0601 14:28:49.078190 7 leaderelection.go:248] attempting to acquire leader lease bmde-ne-nginx/nginx-ingress-ingress-nginx-leader... I0601 14:28:49.078556 7 nginx.go:324] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key" I0601 14:28:49.078875 7 controller.go:190] "Configuration changes detected, backend reload required" I0601 14:28:49.092466 7 leaderelection.go:258] successfully acquired lease bmde-ne-nginx/nginx-ingress-ingress-nginx-leader I0601 14:28:49.092562 7 status.go:84] "New leader elected" identity="nginx-ingress-ingress-nginx-controller-6679b95c85-wzf2q" I0601 14:28:49.127279 7 controller.go:207] "Backend successfully reloaded" I0601 14:28:49.127411 7 controller.go:218] "Initial sync, sleeping for 1 second" I0601 14:28:49.127472 7 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"bmde-ne-nginx", Name:"nginx-ingress-ingress-nginx-controller-6679b95c85-wzf2q", UID:"3cc894ae-b6ae-40b4-8e06-370a03e3df9f", APIVersion:"v1", ResourceVersion:"1380", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration W0601 14:30:38.998226 7 controller.go:1152] Service "httpbin/httpbin" does not have any active Endpoint. I0601 14:30:39.030518 7 admission.go:149] processed ingress via admission controller {testedIngressLength:1 testedIngressTime:0.032s renderingIngressLength:1 renderingIngressTime:0.001s admissionTime:18.1kBs testedConfigurationSize:0.033} I0601 14:30:39.030584 7 main.go:100] "successfully validated configuration, accepting" ingress="httpbin/httpbin" I0601 14:30:39.035382 7 store.go:433] "Found valid IngressClass" ingress="httpbin/httpbin" ingressclass="nginx" I0601 14:30:39.035599 7 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"httpbin", Name:"httpbin", UID:"6a033dd2-7c89-40c3-bcb1-ec88db7cca86", APIVersion:"networking.k8s.io/v1", ResourceVersion:"1549", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync W0601 14:30:42.317416 7 controller.go:1152] Service "httpbin/httpbin" does not have any active Endpoint. I0601 14:30:42.317602 7 controller.go:190] "Configuration changes detected, backend reload required" I0601 14:30:42.437945 7 controller.go:207] "Backend successfully reloaded" I0601 14:30:42.438185 7 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"bmde-ne-nginx", Name:"nginx-ingress-ingress-nginx-controller-6679b95c85-wzf2q", UID:"3cc894ae-b6ae-40b4-8e06-370a03e3df9f", APIVersion:"v1", ResourceVersion:"1380", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration

matthewbrumpton commented 1 year ago

Installed without Azure annotations, with same error:

I0601 17:37:38.119381 7 controller.go:207] "Backend successfully reloaded" I0601 17:37:38.119590 7 controller.go:218] "Initial sync, sleeping for 1 second" I0601 17:37:38.119733 7 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"bmde-ne-nginx", Name:"nginx-ingress-ingress-nginx-controller-6679b95c85-tzwk9", UID:"d039bd8f-5901-492d-9767-7485515bf49d", APIVersion:"v1", ResourceVersion:"12177", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration W0601 17:37:41.937126 7 controller.go:1152] Service "httpbin/httpbin" does not have any active Endpoint. W0601 17:37:46.303769 7 controller.go:1152] Service "httpbin/httpbin" does not have any active Endpoint. W0601 17:37:49.640181 7 controller.go:1152] Service "httpbin/httpbin" does not have any active Endpoint. W0601 17:37:52.974091 7 controller.go:1152] Service "httpbin/httpbin" does not have any active Endpoint.

helm install nginx-ingress ingress-nginx/ingress-nginx --version 4.6.1 --create-namespace --namespace bmde-ne-nginx --set controller.replicaCount=1 --set controller.metrics.enabled=true --set controller.nodeSelector."kubernetes.io/os"=linux ` --set controller.admissionWebhooks.patch.nodeSelector."kubernetes.io/os"=linux

longwuyuan commented 1 year ago

@matthewbrumpton thanks for the data. My comments below

HTTP/2 200 date: Thu, 01 Jun 2023 18:18:01 GMT content-type: text/html; charset=UTF-8 cache-control: no-store x-content-type-options: nosniff x-frame-options: deny x-xss-protection: 1; mode=block strict-transport-security: max-age=15724800; includeSubDomains


- So we need a clear accurate and detailed description of the problem that has to be solved in the controller
- I already typed out the commands  https://github.com/kubernetes/ingress-nginx/issues/9932#issuecomment-1572027445  , that hint the information that will help make progress here but your reports here do not contain any information that will help either describe the problem accurately or help analyse/understand the active state of the ingress resource and the ingress-controller
- It seems like there is no clarity if you are reporting a error message happening at a point in time or if you are reporting a broken ingress while the error message timestamp is matching your HTTP/HTTPS request sent to the ingress-nginx controller
rdb0101 commented 1 year ago

@longwuyuan Will this be enough information in conjunction with the description you have mentioned? Do you think this would be considered a bug?

longwuyuan commented 1 year ago

@rdb0101 if it is a bug, then usually a issue will be tagged as a bug, even without proper description (in the beginning).

I am not able to reproduce on minikube as I mentioned that I see the error message with a old timestamp and not with a timestamp when I access my app. That error is not logged again, after that early timestamp, where state and config was not yet reconciled, during startup of cluster

longwuyuan commented 1 year ago

cc @strongjz for any comments

rdb0101 commented 1 year ago

@longwuyuan So the error messages don't necessarily occur at the exact time one access the app. As mentioned before, when I am able to access my app(s) there is no error message as the ingress-controller "sees" the apps --> service's corresponding endpoint. It's when the app(s) become inaccessible that the error message appears despite having an active or valid endpoint. At any point are you unable to access your app at all?

longwuyuan commented 1 year ago

I am able to access my app all the time and I do not get any new error message after the early pre-reconcilliation message.

And so I have asked for that clear state information, where at one given timestamp (+ or - few seconds) ;

This info will be proof of bug.

Or write a step-by-step instruction on how anyone can reproduce the problem of failing HTTP/HTTPS request with that error message in controller logs

If you have pods restarting or network breaking intermittently, then that may also cause failed request.

I think that if controller was broken, then this will happen to every user because endpointslices is used for every single ingress, since last 3-4 releases

rdb0101 commented 1 year ago

Hi @longwuyuan I can work on step-by-step instructions on how to reproduce the problem. The app only breaks from the frontend, on the backend the apps still work and respond as expected. Hopefully others will be able to provide some additional feedback as well.

longwuyuan commented 1 year ago

@rdb0101 I think its important to establish if the controller has a problem because of which the routing to app fails. If your HTTP/HTTPS request to your frontend breaks, it does not directly mean that the controller is causing the problem.

So first we need the kubectl describe ingress .... output when your HTTP/HTTPS request to frontend fails. Then we need to match the timestamp when you sent request to the timestamp of log messages related to your HTTP/HTTPS request. Then we need to check if pod had problems like networking or cpu or memory temporarily. Or some other proof that without any other problems in the cluster, only the ingress-nginx controller was the cause of broken routing and error message.

RoyAtanu commented 1 year ago

I just started facing this issue after enabling TLS on ingress. Exact same config of ingress (+ service and everything down the line) is still working flawless when TLS is disabled. Using AKS and nginx-ingress controller.