k0sproject / k0s

k0s - The Zero Friction Kubernetes
https://docs.k0sproject.io
Other
3.48k stars 353 forks source link

Calico trying to assign themselves with the same ip address. #2208

Closed OrvilleQ closed 1 year ago

OrvilleQ commented 1 year ago

Before creating an issue, make sure you've checked the following:

Platform

Fedora CoreOS 36.20220906.3.2

Version

v1.25.2+k0s.0

Sysinfo

Machine ID: "ea421e0cb88cbf826fe8912606e89f1d3ba5a0f57e912e93506c4857b0033bed" (from machine) (pass)
Total memory: 7.8 GiB (pass)
Disk space available for /var/lib/k0s: 44.8 GiB (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.19.6-200.fc36.x86_64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  Executable in path: modprobe: /usr/sbin/modprobe (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (pass)
    cgroup controller "memory": available (pass)
    cgroup controller "devices": available (assumed) (pass)
    cgroup controller "freezer": available (assumed) (pass)
    cgroup controller "pids": available (pass)
    cgroup controller "hugetlb": available (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

The Calico node trying to assign itself with the same ip address.

Steps to reproduce

  1. Create a cluster with k0sctl, dualStack enabled.
  2. Run kubectl get pod -A -o wide
  3. Sometimes all calico node failed and can't get log cause can't access port 10250.
  4. Sometimes only one node succeeded and the other trying to assign themselves with the same ip address.

Expected behavior

Calico should work.

Actual behavior

No, it didn't.

Screenshots and logs

k0sctl.yaml:

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: cell
spec:
  hosts:
  - ssh: [MASK]
    role: controller+worker
    noTaints: true
    installFlags:
    - --enable-k0s-cloud-provider=true
    - --enable-cloud-provider=true
  - ssh: [MASK]
    role: controller+worker
    noTaints: true
    installFlags:
    - --enable-k0s-cloud-provider=true
    - --enable-cloud-provider=true
  k0s:
    version: v1.25.2+k0s.0
    dynamicConfig: true
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: Cluster
      metadata:
        name: cell
      spec:
        network:
          podCIDR: "10.244.0.0/16"
          serviceCIDR": "10.96.0.0/12"
          provider: calico
          calico:
            mode: "bird"
            wireguard: true
          dualStack:
            enabled: true
            IPv6podCIDR: "fd00::/108"
            IPv6serviceCIDR: "fd01::/108"

log from the failed calico-node:

$ kubectl logs -p -n kube-system  calico-node-kxxj9 
Defaulted container "calico-node" out of: calico-node, install-cni (init)
2022-09-28 06:33:18.380 [INFO][9] startup/startup.go 425: Early log level set to info
2022-09-28 06:33:18.381 [INFO][9] startup/utils.go 127: Using NODENAME environment for node name 02.[MASK]
2022-09-28 06:33:18.381 [INFO][9] startup/utils.go 139: Determined node name: 02.[MASK]
2022-09-28 06:33:18.381 [INFO][9] startup/startup.go 94: Starting node 02.[MASK] with version v3.24.1
2022-09-28 06:33:18.382 [INFO][9] startup/startup.go 430: Checking datastore connection
2022-09-28 06:33:18.403 [INFO][9] startup/startup.go 454: Datastore connection verified
2022-09-28 06:33:18.403 [INFO][9] startup/startup.go 104: Datastore is ready
2022-09-28 06:33:18.443 [INFO][9] startup/startup.go 483: Initialize BGP data
2022-09-28 06:33:18.444 [INFO][9] startup/autodetection_methods.go 103: Using autodetected IPv4 address on interface podman0: 10.88.0.1/16
2022-09-28 06:33:18.444 [INFO][9] startup/startup.go 559: Node IPv4 changed, will check for conflicts
2022-09-28 06:33:18.460 [WARNING][9] startup/startup.go 988: Calico node '01.[MASK]' is already using the IPv4 address 10.88.0.1.
2022-09-28 06:33:18.460 [INFO][9] startup/startup.go 389: Clearing out-of-date IPv4 address from this node IP="10.88.0.1/16"
2022-09-28 06:33:18.490 [WARNING][9] startup/utils.go 49: Terminating
Calico node failed to start

Additional context

This issue happens few days ago. Everything was working fine before that day.

I tried to change host OS (CentOS, OpenSUSE) and k0s version (1.24) but still present the same problem.

makhov commented 1 year ago

By default, Calico uses first-found method to detect node IP, which returns first valid IP address on the first valid interface. https://projectcalico.docs.tigera.io/reference/node/configuration#ip-autodetection-methods

You can change it by setting spec.network.calico.ipAutodetectionMethod in your k0s config, e.g.:

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: cell
spec:
  hosts:
  - ssh: [MASK]
    role: controller+worker
    noTaints: true
    installFlags:
    - --enable-k0s-cloud-provider=true
    - --enable-cloud-provider=true
  - ssh: [MASK]
    role: controller+worker
    noTaints: true
    installFlags:
    - --enable-k0s-cloud-provider=true
    - --enable-cloud-provider=true
  k0s:
    version: v1.25.2+k0s.0
    dynamicConfig: true
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: Cluster
      metadata:
        name: cell
      spec:
        network:
          podCIDR: "10.244.0.0/16"
          serviceCIDR": "10.96.0.0/12"
          provider: calico
          calico:
            mode: "bird"
            wireguard: true          
            ipAutodetectionMethod: "kubernetes-internal-ip" # Just an example, use whatever works for you
          dualStack:
            enabled: true
            IPv6podCIDR: "fd00::/108"
            IPv6serviceCIDR: "fd01::/108"
OrvilleQ commented 1 year ago

Seems like this is something relate to Calico side. Have opened an issue there, Thanks for your help.