projectcalico / calico

Cloud native networking and network security
https://docs.tigera.io/calico/latest/about/
Apache License 2.0
6.02k stars 1.34k forks source link

Calico unable to run due to ipset errors. #6953

Closed issmirnov closed 11 months ago

issmirnov commented 2 years ago

I am attempting to install Calico on 3x k3s nodes. The

Expected Behavior

Calico nodes should come up and work, as per https://projectcalico.docs.tigera.io/getting-started/kubernetes/k3s/multi-node-install.

Current Behavior

Calico does not come up.

Output of kubectl get -n calico-system all

$ k get -n calico-system all
NAME                                          READY   STATUS    RESTARTS      AGE
pod/calico-kube-controllers-8fd8bbb4b-64cxn   1/1     Running   2 (30h ago)   2d8h
pod/calico-node-r86sb                         0/1     Running   3 (30h ago)   2d8h
pod/calico-node-s2n5m                         0/1     Running   2 (30h ago)   2d8h
pod/calico-node-vxgrl                         0/1     Running   2 (30h ago)   2d8h
pod/calico-typha-6cdbf894dd-fldgp             1/1     Running   2 (30h ago)   2d8h
pod/calico-typha-6cdbf894dd-lflvk             1/1     Running   2 (30h ago)   2d8h

NAME                                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/calico-kube-controllers-metrics   ClusterIP   10.43.79.158   <none>        9094/TCP   2d8h
service/calico-typha                      ClusterIP   10.43.62.46    <none>        5473/TCP   2d8h

NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/calico-node       3         3         0       3            0           kubernetes.io/os=linux   2d8h
daemonset.apps/csi-node-driver   0         0         0       0            0           kubernetes.io/os=linux   2d8h

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/calico-kube-controllers   1/1     1            1           2d8h
deployment.apps/calico-typha              2/2     2            2           2d8h

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/calico-kube-controllers-8fd8bbb4b   1         1         1       2d8h
replicaset.apps/calico-typha-6cdbf894dd             2         2         2       2d8h

Logs of calico pod:

bird: Mesh_10_42_112_128: Socket error: bind: Cannot assign requested address
2022-11-05 02:30:46.466 [INFO][2619] felix/route_table.go 1200: Failed to access interface because it doesn't exist. error=Link not found ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 tableIndex=0
2022-11-05 02:30:46.466 [INFO][2619] felix/route_table.go 1268: Failed to get interface; it's down/gone. error=Link not found ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 tableIndex=0
2022-11-05 02:30:46.466 [ERROR][2619] felix/route_table.go 1035: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4 tableIndex=0
2022-11-05 02:30:46.466 [INFO][2619] felix/route_table.go 618: Interface missing, will retry if it appears. ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4 tableIndex=0

Output of sudo kubectl calico node diags

Collecting diagnostics
Using temp dir: /tmp/calico3518383960
Dumping netstat
Dumping routes (IPv4)
Dumping routes (IPv6)
Dumping interface info (IPv4)
Dumping interface info (IPv6)
Dumping iptables (IPv4)
Dumping iptables (IPv6)
Dumping ipsets
Failed to run command: ipset list
Error: ipset v7.6: Kernel and userspace incompatible: settype hash:net with revision 7 not supported by userspace.

Dumping ipsets (container)
Failed to run command: docker run --rm --privileged --net=host calico/node ipset list
Error: 
Copying journal for calico-node.service
Dumping felix stats
Failed to run command: pkill -SIGUSR1 felix
Error: 
Copying Calico logs
Error creating log directory: mkdir /tmp/calico3518383960/diagnostics/logs: file exists
Error compressing the diagnostics: exit status 1

Diags saved to /tmp/calico3518383960/diags-20221104_203112.tar.gz
If required, you can upload the diagnostics bundle to a file sharing service.

Output of ifconfig on the master node:

bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        inet 192.168.4.21  netmask 255.255.255.0  broadcast 192.168.4.255
        inet6 fe80::946e:d4ff:fef7:6db  prefixlen 64  scopeid 0x20<link>
        ether 96:6e:d4:f7:06:db  txqueuelen 1000  (Ethernet)
        RX packets 15948497  bytes 4017120977 (4.0 GB)
        RX errors 0  dropped 58  overruns 0  frame 0
        TX packets 17795375  bytes 3793871278 (3.7 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

cali34f2f224243: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 42183  bytes 4209654 (4.2 MB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 2853  bytes 121322 (121.3 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

cali80a9d1cbc96: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 64977  bytes 10716974 (10.7 MB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 65923  bytes 5553296 (5.5 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calia69ba97dc3b: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 42279  bytes 4174414 (4.1 MB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 2948  bytes 125312 (125.3 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calib5fe444ad9c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 73409  bytes 6230428 (6.2 MB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 77013  bytes 74917643 (74.9 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calib753f4ed86c: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 45  bytes 3246 (3.2 KB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 13  bytes 2042 (2.0 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calicfc031e617a: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 175973  bytes 14888480 (14.8 MB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 2951  bytes 125438 (125.4 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calid19cf1bc772: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 1038082  bytes 95874149 (95.8 MB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 1141500  bytes 912643133 (912.6 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calid2092b922bd: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 3059  bytes 266486 (266.4 KB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 348  bytes 16112 (16.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calid8f0d427ac4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 44  bytes 3176 (3.1 KB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 13  bytes 2042 (2.0 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calidd8799b5ac8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 175916  bytes 15001322 (15.0 MB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 2897  bytes 123170 (123.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

calif179886f244: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 175916  bytes 15577998 (15.5 MB)
        RX errors 0  dropped 1  overruns 0  frame 0
        TX packets 2896  bytes 123128 (123.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Ipset version:

$ ipset -v
ipset v7.15, protocol version: 7

Output of ipset list

Name: cali40all-ipam-pools
Type: hash:net
Revision: 7
Header: family inet hashsize 1024 maxelem 1048576 bucketsize 12 initval 0x0835a594
Size in memory: 504
References: 1
Number of entries: 1
Members:
10.42.0.0/16

Name: cali40masq-ipam-pools
Type: hash:net
Revision: 7
Header: family inet hashsize 1024 maxelem 1048576 bucketsize 12 initval 0x3a77d41c
Size in memory: 504
References: 1
Number of entries: 1
Members:
10.42.0.0/16

Name: cali40this-host
Type: hash:ip
Revision: 5
Header: family inet hashsize 1024 maxelem 1048576 bucketsize 12 initval 0xaa5894fe
Size in memory: 360
References: 0
Number of entries: 4
Members:
127.0.0.1
100.92.246.125
192.168.4.22
127.0.0.0

Name: cali40all-vxlan-net
Type: hash:net
Revision: 7
Header: family inet hashsize 1024 maxelem 1048576 bucketsize 12 initval 0x64e76198
Size in memory: 552
References: 2
Number of entries: 2
Members:
10.42.112.128
10.42.254.129

Possible Solution

It seems that calico can't create the network interfaces. Per https://github.com/projectcalico/calico/issues/5717 this was fixed in v3.22.4, and my version of calico is newer. I'm not sure what's wrong.

Perhaps it's related to https://github.com/rancher/rancher/issues/38017 ?

Steps to Reproduce (for bugs)

  1. Deploy a k3s Ubuntu 22.04.1 server.
  2. Follow guide at https://projectcalico.docs.tigera.io/getting-started/kubernetes/k3s/multi-node-install and launch the tigera-operator.yaml and custom-resources.yaml files.
  3. Run kubectl get -n calico-system all and observer the failed pods.

Context

I am attempting to deploy k3s, replacing flannel with calico for our workloads.

Your Environment

cureforoptimism commented 2 years ago

Seeing this, too, in a similar environment (Ubuntu/rke2/rancher/hardened-calico:v3.24.1-build20221011)

caseydavenport commented 2 years ago

I thought it might be this bug which was introduced in v3.24.3 and fixed in v3.24.4: https://github.com/projectcalico/calico/issues/6927

But since @cureforoptimism is seeing this in v3.24.1, sounds like it might be something different.

2022-11-05 02:30:46.466 [ERROR][2619] felix/route_table.go 1035: Failed to get link attributes error=interface not present ifaceRegex="^vxlan.calico$" ipVersion=0x4 tableIndex=0

Does the vxlan.calico interface exist on your node if you run ip link show?

issmirnov commented 2 years ago

Not on my node - just the calibfooos

issmirnov commented 2 years ago

How can we best assist with debugging?

We're eager to get this launched, and will happily deploy some eng resources to generate more logs or run dev builds.

Happy to help in any way possible once this reaches the top of your queue.

Thank you in advance!

lwr20 commented 1 year ago

Setting spec.calicoNetwork.nodeAddressAutodetectionV4 in the Installation resource to something other than firstFound should rule out #6927 for certain. https://projectcalico.docs.tigera.io/reference/installation/api#operator.tigera.io/v1.NodeAddressAutodetection

lwr20 commented 1 year ago

Is there anything else running that might be trying to manage interfaces? e.g. network manager, etc

orihomie commented 1 year ago

I also have the same problem, as my hosts are arm64 and calico also fails to gather ipset list while executing:

$ sudo -E calicoctl node diags
Collecting diagnostics
Using temp dir: /tmp/calico3738869463
Dumping netstat
Dumping routes (IPv4)
Dumping routes (IPv6)
Dumping interface info (IPv4)
Dumping interface info (IPv6)
Dumping iptables (IPv4)
Dumping iptables (IPv6)
Dumping ipsets
Failed to run command: ipset list
Error: 
Dumping ipsets (container)
Failed to run command: docker run --rm --privileged --net=host calico/node ipset list
Error: ipset v7.1: Kernel and userspace incompatible: settype hash:net with revision 7 not supported by userspace.

Copying journal for calico-node.service
Dumping felix stats
Failed to run command: pkill -SIGUSR1 felix
Error: 
Copying Calico logs
Error creating log directory: mkdir /tmp/calico3738869463/diagnostics/logs: file exists

Diags saved to /tmp/calico3738869463/diags-20221205_064915.tar.gz
If required, you can upload the diagnostics bundle to a file sharing service.

As we can see here it runs

docker run --rm --privileged --net=host calico/node ipset list

But, if you run

sudo docker run --rm --privileged --platform=aarch64 --net=host calico/node:v3.24.5-arm64 ipset list

You'll see proper output, so, is that somehow related to some inner ctl bug?

And that's a bit strange because latest and v3.24.5-arm64 differs only by Variant property (version doesnt have that).

UPD full difference between the versions is here There are some differences in ContainerConfig and Config props.