k0sproject / k0s

k0s - The Zero Friction Kubernetes
https://docs.k0sproject.io
Other
3.8k stars 368 forks source link

High CPU usage on control plane using CPLB #5087

Closed jafnhaar closed 1 month ago

jafnhaar commented 1 month ago

Before creating an issue, make sure you've checked the following:

Platform

Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:    12
Codename:   bookworm

Version

v1.31.1+k0s.1

Sysinfo

`k0s sysinfo`
Total memory: 3.8 GiB (pass)
File system of /var/lib/k0s: ext4 (pass)
Disk space available for /var/lib/k0s: 22.1 GiB (pass)
Relative disk space available for /var/lib/k0s: 75% (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.1.0-26-amd64 (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

After bootstrapping the cluster with CPLB using VRRP I noticed that all control plane nodes started using about 70-80% of CPU. Relative to my old cluster with same config which used less than 10% CPU with external LB. I used the following config:

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: test
spec:
  hosts:
  - role: controller
    ssh:
      address: 10.141.0.10
      user: root
      keyPath: ~/.ssh/id_rsa
    k0sBinaryPath: /opt/k0s
    uploadBinary: true
  - role: controller
    ssh:
      address: 10.141.0.11
      user: root
      keyPath: ~/.ssh/id_rsa
    k0sBinaryPath: /opt/k0s
    uploadBinary: true
  - role: controller
    ssh:
      address: 10.141.0.12
      user: root
      keyPath: ~/.ssh/id_rsa
    k0sBinaryPath: /opt/k0s
    uploadBinary: true
  - role: worker
    ssh:
      address: 10.141.0.13
      user: root
      keyPath: ~/.ssh/id_rsa
    k0sBinaryPath: /opt/k0s
    uploadBinary: true
  - role: worker
    ssh:
      address: 10.141.0.14
      user: root
      keyPath: ~/.ssh/id_rsa
    k0sBinaryPath: /opt/k0s
    uploadBinary: true
  - role: worker
    ssh:
      address: 10.141.0.15
      user: root
      keyPath: ~/.ssh/id_rsa
    k0sBinaryPath: /opt/k0s
    uploadBinary: true
  k0s:
    version: v1.31.1+k0s.0
    config:
      spec:
        api:
          sans:
          - 10.141.0.253
        network:
          controlPlaneLoadBalancing:
            enabled: true
            type: Keepalived
            keepalived:
              vrrpInstances:
              - virtualIPs: ["10.141.0.253/24"]
                authPass: Example
              virtualServers:
              - ipAddress: "10.141.0.253"

Steps to reproduce

  1. setup 6 VMs
  2. use k0sctl to bootstrap cluster with CPLB
  3. examine high CPU usage of control plane nodes

Expected behavior

CPU usage should be around same compared to external LB

Actual behavior

CPU usage is really high

Screenshots and logs

No response

Additional context

No response

twz123 commented 1 month ago

CPLB is for external traffic only. For cluster-Internal load balancing, you need to enable node-local load balancing, as well. I guess that konnectivity is consuming most of the CPU, because it tries to get connections to all three controllers, but gets only a connection to one of them. This should go away after NLLB is enabled.

twz123 commented 1 month ago

/xref k0sproject/k0sctl#475

jafnhaar commented 1 month ago

CPLB is for external traffic only. For cluster-Internal load balancing, you need to enable node-local load balancing, as well. I guess that konnectivity is consuming most of the CPU, because it tries to get connections to all three controllers, but gets only a connection to one of them. This should go away after NLLB is enabled.

Yes, that fixes that. Thanks.