Version 1.11.2 not work on RKE

rafaelarcanjo commented 2 weeks ago

Hello,

I have the ingress in version 1.8.4 working perfectly, with CVE-2024-7646 I would like to update to the lastest version, however it is not working.

My environment is an RKE in version v1.26.11+rke2r1 running on baremetal, I used the specific deployment for baremetal, but the pods does not go online.

I rolled back to version 1.8.4 and it's running without problems.

Could you help?

Thank you.


Namespace:        ingress-nginx
Priority:         0
Service Account:  ingress-nginx
Node:             rke-node-2/10.3.0.31
Start Time:       Thu, 29 Aug 2024 09:02:38 -0300
Labels:           app.kubernetes.io/component=controller
                  app.kubernetes.io/instance=ingress-nginx
                  app.kubernetes.io/name=ingress-nginx
                  app.kubernetes.io/part-of=ingress-nginx
                  app.kubernetes.io/version=1.11.2
                  pod-template-hash=5b5cbd7b99
Annotations:      cni.projectcalico.org/containerID: 49c996021f183ad7a71ddab9c4e66265b2b03890f7cc24830f0aa283583f46ae
                  cni.projectcalico.org/podIP: 10.42.1.246/32
                  cni.projectcalico.org/podIPs: 10.42.1.246/32
Status:           Running
IP:               10.42.1.246
IPs:
  IP:           10.42.1.246
Controlled By:  ReplicaSet/ingress-nginx-controller-5b5cbd7b99
Containers:
  controller:
    Container ID:    containerd://03590079e77b282f96719ed81d229d083bb0e57382d929de7888ce814cc52d68
    Image:           registry.k8s.io/ingress-nginx/controller:v1.11.2@sha256:d5f8217feeac4887cb1ed21f27c2674e58be06bd8f5184cacea2a69abaf78dce
    Image ID:        registry.k8s.io/ingress-nginx/controller@sha256:d5f8217feeac4887cb1ed21f27c2674e58be06bd8f5184cacea2a69abaf78dce
    Ports:           80/TCP, 443/TCP, 8443/TCP
    Host Ports:      0/TCP, 0/TCP, 0/TCP
    SeccompProfile:  RuntimeDefault
    Args:
      /nginx-ingress-controller
      --election-id=ingress-nginx-leader
      --controller-class=k8s.io/ingress-nginx
      --ingress-class=nginx
      --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
      --validating-webhook=:8443
      --validating-webhook-certificate=/usr/local/certificates/cert
      --validating-webhook-key=/usr/local/certificates/key
      --tcp-services-configmap=ingress-nginx/tcp-services
    State:          Running
      Started:      Thu, 29 Aug 2024 09:02:43 -0300
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   90Mi
    Liveness:   http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=5
    Readiness:  http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       ingress-nginx-controller-5b5cbd7b99-5655q (v1:metadata.name)
      POD_NAMESPACE:  ingress-nginx (v1:metadata.namespace)
      LD_PRELOAD:     /usr/local/lib/libmimalloc.so
    Mounts:
      /usr/local/certificates/ from webhook-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7zq6r (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  webhook-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-nginx-admission
    Optional:    false
  kube-api-access-7zq6r:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason       Age                  From                      Message
  ----     ------       ----                 ----                      -------
  Normal   Scheduled    105s                 default-scheduler         Successfully assigned ingress-nginx/ingress-nginx-controller-5b5cbd7b99-5655q to rke-node-2
  Warning  FailedMount  103s (x3 over 105s)  kubelet                   MountVolume.SetUp failed for volume "webhook-cert" : secret "ingress-nginx-admission" not found
  Normal   Pulled       101s                 kubelet                   Container image "registry.k8s.io/ingress-nginx/controller:v1.11.2@sha256:d5f8217feeac4887cb1ed21f27c2674e58be06bd8f5184cacea2a69abaf78dce" already present on machine
  Normal   Created      100s                 kubelet                   Created container controller
  Normal   Started      100s                 kubelet                   Started container controller
  Normal   RELOAD       55s                  nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Warning  Unhealthy    41s (x5 over 81s)    kubelet                   Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing      41s                  kubelet                   Container controller failed liveness probe, will be restarted
  Warning  Unhealthy    1s (x11 over 81s)    kubelet                   Readiness probe failed: HTTP probe failed with statuscode: 500

0s          Warning   Unhealthy           pod/ingress-nginx-controller-5b5cbd7b99-5655q    Readiness probe failed: HTTP probe failed with statuscode: 500
0s          Warning   Unhealthy           pod/ingress-nginx-controller-5b5cbd7b99-5655q    Readiness probe failed: HTTP probe failed with statuscode: 500
0s          Warning   Unhealthy           pod/ingress-nginx-controller-5b5cbd7b99-5655q    Readiness probe failed: HTTP probe failed with statuscode: 500
0s          Warning   Unhealthy           pod/ingress-nginx-controller-5b5cbd7b99-5655q    Readiness probe failed: HTTP probe failed with statuscode: 500
0s          Warning   Unhealthy           pod/ingress-nginx-controller-5b5cbd7b99-5655q    Readiness probe failed: HTTP probe failed with statuscode: 500```

k8s-ci-robot commented 2 weeks ago

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

strongjz commented 2 weeks ago

We have made several changes in defaults, with security defaults being set to true that were false.

Can you post the helm chart values you used for the upgrade, there were also major changes to the helm chart.

1.8 to 1.11 is a major jump; I would go back to release notes and see if there are major differences that would break an RKE cluster. Also try upgrades of one minor release at a time.

I would check to make sure all ports are open for the controller; one for the health check is 10254.

longwuyuan commented 2 weeks ago

/remove-kind bug /kind support /triage needs-information

mruoss commented 1 week ago

Same here. I have updated from 1.10.0. I'm deploying with these Helm values:


controller:
  kind: Deployment
  replicaCount: 3
  revisionHistoryLimit: 3
  resources:
    requests:
      cpu: 100m
      memory: 500Mi
    limits:
      cpu: 500m
      memory: 500Mi
  allowSnippetAnnotations: true
  config:
    proxy-body-size: "10m"
  service:
    externalTrafficPolicy: Local
  ingressClass: nginx
  minAvailable: 2
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/component: controller
              app.kubernetes.io/name: ingress-nginx
          topologyKey: kubernetes.io/hostname

longwuyuan commented 1 week ago

For this error ;

Readiness probe failed: HTTP probe failed with statuscode: 500

please post proof that all requirements are met like port 10254 is open etc etc.

For other errors, please post the details that someone can analyze. The required details are asked in the template of a new bug report.

mruoss commented 1 week ago

I see this, too: Liveness probe failed: Get "http://10.24.9.43:10254/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

mruoss commented 1 week ago

maybe just not enough CPU TO respond in time...?

longwuyuan commented 1 week ago

The controller does not have any RKE specific code so the error message has to be used to hunt for root cause. I suspected that the port for healthcheck (port number 10254) is not open between the nodes. Also as @mruoss asked, just the right amount of starvation of cpu/mem/network for a related process at the observed timestamp could also be root cause.

Plz check error message and accordingly hunt down the details of status from logs etc.

Once you have a action item for the project like a series of details step by step instructions to reproduce the problem at will, please re-open the issue. I will close the issue for now as it is adding to the tally of open issues, without tracking any action item on anyone.

/close

k8s-ci-robot commented 1 week ago

@longwuyuan: Closing this issue.

In response to [this](https://github.com/kubernetes/ingress-nginx/issues/11911#issuecomment-2334994137): >The controller does not have any RKE specific code so the error message has to be used to hunt for root cause. I suspected that the port for healthcheck (port number 10254) is not open between the nodes. Also as @mruoss asked, just the right amount of starvation of cpu/mem/network for a related process at the observed timestamp could also be root cause. > >Plz check error message and accordingly hunt down the details of status from logs etc. > >Once you have a action item for the project like a series of details step by step instructions to reproduce the problem at will, please re-open the issue. I will close the issue for now as it is adding to the tally of open issues, without tracking any action item on anyone. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes / ingress-nginx

Version 1.11.2 not work on RKE #11911