Single Node with NLLB leads to broken kube-proxy

pschichtel commented 9 months ago

Before creating an issue, make sure you've checked the following:

[X] You are running the latest released version of k0s
[X] Make sure you've searched for existing issues, both open and closed
[X] Make sure you've searched for PRs too, a fix might've been merged already
[X] You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Version

v1.29.1+k0s.1

Sysinfo

`k0s sysinfo`

Machine ID: "ffcdad372b9befda1530d90a1ef7ff774889af28ce380d2898c9783c54ad399a" (from machine) (pass)
Total memory: 7.8 GiB (pass)
Disk space available for /var/lib/k0s: 22.4 GiB (pass)
Name resolution: localhost: [127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.1.0-18-amd64 (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

I deployed a k0s single-node using k0sctl based on a config for a multi-node deployment. I adjusted the config but completely forgot, that NLLB doesn't make sense on a single node deployment.

the k0s deployment went through, but the network plugin (calico) didn't become ready and upon inspecting it and some googling I got the kube-proxy, which was trying to access the API server via the NLLB port, even though no NLLB pods have been started. Disabling NLLB and restarting kube-proxy fixed the issue.

So there is an inconsistency here:

either deploy the NLLB pod even on a single node setup, when it is enabled
or configure kube-proxy for the fact, that NLLB is not deployed on single node setups

Steps to reproduce

Deploy a single node k0s cluster with NLLB enabled
See that kube-proxy can't access api server

Expected behavior

either deploy the NLLB pod even on a single node setup, when it is enabled
or configure kube-proxy for the fact, that NLLB is not deployed on single node setups

Actual behavior

no NLLB pod is deployed, but kube-proxy is still configured to access the api server via envoy.

Screenshots and logs

No response

Additional context

No response

pschichtel commented 9 months ago

I'm not entirely sure if this is a k0sctl or k0s issue.

jnummelin commented 9 months ago

Could you share the k0sctl yaml you used so we can have a look and test the same config

pschichtel commented 9 months ago

yes:

this is the fixed version:

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
  - openSSH:
      address: some-host
    role: single
    noTaints: true
    dataDir: /var/kubernetes/k0s
  k0s:
    version: v1.29.1+k0s.1
    dynamicConfig: false
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: ClusterConfig
      metadata:
        name: some-cluster
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
        extensions:
          storage:
            create_default_storage_class: false
            type: external_storage
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          provider: calico
          calico:
            mode: ipip
            overlay: Always
          clusterDomain: cluster.local
          dualStack: {}
          kubeProxy:
            iptables:
              masqueradeAll: true
              minSyncPeriod: 0s
              syncPeriod: 0s
            ipvs:
              minSyncPeriod: 0s
              syncPeriod: 0s
              tcpFinTimeout: 0s
              tcpTimeout: 0s
              udpTimeout: 0s
            metricsBindAddress: 0.0.0.0:10249
            mode: iptables
          kuberouter:
            autoMTU: true
            hairpin: Enabled
            ipMasq: false
            mtu: 0
          podCIDR: 10.244.0.0/16
          serviceCIDR: 10.96.0.0/12
        scheduler: {}
        storage:
          type: etcd
        telemetry:
          enabled: true

this is the broken version:

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
  - openSSH:
      address: some-host
    role: single
    noTaints: true
    dataDir: /var/kubernetes/k0s
  k0s:
    version: v1.29.1+k0s.1
    dynamicConfig: false
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: ClusterConfig
      metadata:
        name: some-cluster
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
        extensions:
          storage:
            create_default_storage_class: false
            type: external_storage
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          provider: calico
          calico:
            mode: ipip
            overlay: Always
          clusterDomain: cluster.local
          dualStack: {}
          kubeProxy:
            iptables:
              masqueradeAll: true
              minSyncPeriod: 0s
              syncPeriod: 0s
            ipvs:
              minSyncPeriod: 0s
              syncPeriod: 0s
              tcpFinTimeout: 0s
              tcpTimeout: 0s
              udpTimeout: 0s
            metricsBindAddress: 0.0.0.0:10249
            mode: iptables
          kuberouter:
            autoMTU: true
            hairpin: Enabled
            ipMasq: false
            mtu: 0
          nodeLocalLoadBalancing:
            enabled: true
            envoyProxy:
              apiServerBindPort: 7443
              image:
                image: docker.io/envoyproxy/envoy-distroless
              konnectivityServerBindPort: 7132
            type: EnvoyProxy
          podCIDR: 10.244.0.0/16
          serviceCIDR: 10.96.0.0/12
        scheduler: {}
        storage:
          type: etcd
        telemetry:
          enabled: true

this only difference is the nodeLocalLoadBalancing section.

twz123 commented 9 months ago

Right. This is actually documented. But in contrast to the conflict with an external API address, this is checked, but not reported as an error. K0s should probably error out in this case.

twz123 commented 9 months ago

I'm not entirely sure if this is a k0sctl or k0s issue.

Despite k0s not being fail-fast here, this is kinda also another instance in which k0sproject/k0sctl#475 would have helped.

k0sproject / k0s