kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
16.95k stars 8.14k forks source link

num of worker_processes set to max num of cores of cluster node with cgroups-v2 #11518

Open figaw opened 5 days ago

figaw commented 5 days ago

Am I holding it wrong?

I'm reading a comment on here, which says to adjust worker process to no more than 24, mine is automatically adjusted to 128, which causes weird things to happen.

https://github.com/kubernetes/ingress-nginx/issues/3574#issuecomment-448229118

The problem goes away, when I set worker_processes in the helm chart.

controller:
  config:
    worker-processes: 24  

Where is this documented? I've tried to search around for comments on ulimits and ingress-nginx, but I'm not finding a lot.

What happened:

From the logs of the ingress-nginx-controller I'm reading..

2024/06/29 20:31:34 [alert] 42#42: socketpair() failed while spawning "worker process" (24: No file descriptors available)
2024/06/29 20:31:34 [alert] 42#42: socketpair() failed while spawning "worker process" (24: No file descriptors available)
2024/06/29 20:31:34 [alert] 42#42: socketpair() failed while spawning "worker process" (24: No file descriptors available)
2024/06/29 20:31:34 [alert] 42#42: socketpair() failed while spawning "worker process" (24: No file descriptors available)

This all went away when I configured worker_processes 24 in the helm chart.

Maybe this is related to https://github.com/kubernetes/ingress-nginx/pull/7107?

What you expected to happen:

NGINX automagically configures a proper number of worker process'. I expect this has something to do with the 128 cores..

When I'm running ulimit inside the container, I'm getting quite low values,

ingress-nginx-private-controller-f56b88476-b8tpq:/etc/nginx$ ulimit -Hn
524288
ingress-nginx-private-controller-f56b88476-b8tpq:/etc/nginx$ ulimit -Sn
1024

Despite having configured the host,

$ cat /etc/security/limits.conf
# /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535
$ ulimit -Hn
65535
$ ulimit -Sn
65535

And also having configured containerd:

$ cat /etc/containerd/config.toml | grep runc.options -A 20
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            .....
            Ulimits = [
              { Name = "nofile", Hard = 65535, Soft = 65535 }
            ]

I also tried using an initContainer with the helm chart, to no avail..

  extraInitContainers:
    - name: init-myservice
      image: busybox
      command: ["sh", "-c", "ulimit -n 65535"]

I'm "pretty sure" all of the machines in our cluster will have at least 24 cores, so this is "probably" not a problem to configure statically.

NGINX Ingress controller version (exec ...):

NGINX Ingress controller Release: v1.10.1 Build: 4fb5aac1dd3669daa3a14d9de3e3cdb371b4c518 Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.25.3


Kubernetes version (use kubectl version):

Client Version: v1.26.0 Kustomize Version: v4.5.7 Server Version: v1.29.0

Environment:

Bare metal, super micro, AMD EPYC 7763 64-Core Processor, 256G RAM

PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.5", GitCommit:"59755ff595fa4526236b0cc03aa2242d941a5171", GitTreeState:"clean", BuildDate:"2024-05-14T10:44:51Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
b-w-1   Ready    control-plane   16h   v1.29.5   172.17.90.1   <none>        Ubuntu 22.04.4 LTS   5.15.0-112-generic   containerd://1.7.13
b-w-2   Ready    control-plane   16h   v1.29.5   172.17.90.3   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.13
b-w-3   Ready    control-plane   16h   v1.29.5   172.17.90.5   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.13
b-w-4   Ready    <none>          16h   v1.29.5   172.17.90.7   <none>        Ubuntu 22.04.4 LTS   5.15.0-113-generic   containerd://1.7.13
longwuyuan commented 5 days ago

duplicate https://github.com/kubernetes/ingress-nginx/issues/9665 /triage accepted

longwuyuan commented 5 days ago

/retitle num of worker_processes set to max num of cores of cluster node with cgroups-v2