Closed RichardSufliarsky closed 1 year ago
@RichardSufliarsky if the underlying infrastructure provides connectivity, which seems to be the case for you, you can disable any sort of tunnelling like vxlan.
Regarding the MTU, it does not hurt to have lower value. The only downside is that you won't get higher bandwidth from using 9000 MTU. You can increase that using Felix configuration (vxlanMTU
and vxlanMTUV6
)
It would be great to understand why some of the nodes do not have the interface. Is there anything different on those nodes? Can you also share the Felix configuration and available IP pools?
@mazdakn all the nodes were installed same way, they have different versions of OS, but this seems not to be the differentiator. Also net interfaces that are used for kubernetes have different names on different machines, but on nodes node001-node008 are the same, these are exact same HW, exact same installation steps taken and two of them have vxlan.calico interface.
kubectl get nodes -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
api001.lab.company1.io Ready <none> 286d v1.24.2 172.16.2.30 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.13.1.el8_7.x86_64 cri-o://1.24.4
gpu001.lab.company1.io Ready <none> 308d v1.24.2 172.16.2.60 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.13.1.el8_7.x86_64 cri-o://1.24.4
gpu002.lab.company1.io Ready <none> 348d v1.24.2 172.16.2.80 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.19.2.el8_7.x86_64 cri-o://1.24.5
gpu003.lab.company1.io Ready <none> 320d v1.24.2 172.16.2.90 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.19.2.el8_7.x86_64 cri-o://1.24.5
gpu004.lab.company1.io Ready <none> 148d v1.24.2 172.16.2.91 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.13.1.el8_7.x86_64 cri-o://1.24.4
gpu005.lab.company1.io Ready <none> 171d v1.24.2 172.16.2.92 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.3.1.el8.x86_64 cri-o://1.24.3
gpu006.lab.company1.io Ready <none> 99d v1.24.2 172.16.2.93 <none> Red Hat Enterprise Linux 9.1 (Plow) 5.14.0-162.18.1.el9_1.x86_64 cri-o://1.24.4
k8s1.lab.company1.io Ready control-plane 412d v1.25.10 172.16.2.11 <none> Red Hat Enterprise Linux 9.2 (Plow) 5.14.0-284.11.1.el9_2.x86_64 cri-o://1.25.3
k8s2.lab.company1.io Ready control-plane 412d v1.25.10 172.16.2.12 <none> Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-425.13.1.el8_7.x86_64 cri-o://1.25.3
k8s3.lab.company1.io Ready control-plane 412d v1.25.10 172.16.2.13 <none> Red Hat Enterprise Linux 8.8 (Ootpa) 4.18.0-477.10.1.el8_8.x86_64 cri-o://1.25.3
nas001.lab.company1.io Ready <none> 333d v1.24.2 172.16.2.10 <none> Red Hat Enterprise Linux 8.7 (Ootpa) 4.18.0-425.13.1.el8_7.x86_64 cri-o://1.24.4
node001.lab.company1.io Ready <none> 89d v1.24.2 172.16.2.21 <none> Red Hat Enterprise Linux 9.1 (Plow) 5.14.0-162.18.1.el9_1.x86_64 cri-o://1.24.5
node002.lab.company1.io Ready <none> 89d v1.24.2 172.16.2.22 <none> Red Hat Enterprise Linux 9.1 (Plow) 5.14.0-162.18.1.el9_1.x86_64 cri-o://1.24.5
node003.lab.company1.io Ready <none> 89d v1.24.2 172.16.2.23 <none> Red Hat Enterprise Linux 9.1 (Plow) 5.14.0-162.18.1.el9_1.x86_64 cri-o://1.24.5
node004.lab.company1.io Ready <none> 89d v1.24.2 172.16.2.24 <none> Red Hat Enterprise Linux 9.1 (Plow) 5.14.0-162.18.1.el9_1.x86_64 cri-o://1.24.5
node005.lab.company1.io Ready <none> 89d v1.24.2 172.16.2.25 <none> Red Hat Enterprise Linux 9.1 (Plow) 5.14.0-162.18.1.el9_1.x86_64 cri-o://1.24.4
node006.lab.company1.io Ready <none> 89d v1.24.2 172.16.2.26 <none> Red Hat Enterprise Linux 9.1 (Plow) 5.14.0-162.18.1.el9_1.x86_64 cri-o://1.24.5
node007.lab.company1.io Ready <none> 89d v1.24.2 172.16.2.27 <none> Red Hat Enterprise Linux 9.1 (Plow) 5.14.0-162.22.2.el9_1.x86_64 cri-o://1.24.5
node008.lab.company1.io Ready <none> 89d v1.24.2 172.16.2.28 <none> Red Hat Enterprise Linux 9.1 (Plow) 5.14.0-162.18.1.el9_1.x86_64 cri-o://1.24.5
redis002.lab.company1.io Ready <none> 295d v1.24.2 172.16.2.50 <none> Red Hat Enterprise Linux 8.5 (Ootpa) 4.18.0-348.12.2.el8_5.x86_64 cri-o://1.24.2
kubectl get felixconfiguration default -oyaml
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
creationTimestamp: "2022-04-27T18:26:27Z"
name: default
resourceVersion: "18992585"
uid: b3db7f7e-4e5a-4845-92f7-9cd0bf21d325
spec:
bpfLogLevel: ""
floatingIPs: Disabled
healthPort: 9099
logSeverityScreen: Info
reportingInterval: 0s
vxlanEnabled: true
For the IPPool, I think that I have changed vxlanMode to Never, but I am not sure.
kubectl get ippool default-ipv4-ippool -oyaml
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
creationTimestamp: "2022-04-27T18:26:27Z"
name: default-ipv4-ippool
resourceVersion: "276576540"
uid: 490ff4c8-2e67-4d68-9195-c600684d7111
spec:
allowedUses:
- Workload
- Tunnel
blockSize: 26
cidr: 172.18.0.0/16
ipipMode: Never
natOutgoing: true
nodeSelector: all()
vxlanMode: Never
kubectl get installation default -oyaml
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
creationTimestamp: "2022-04-27T18:26:23Z"
finalizers:
- tigera.io/operator-cleanup
generation: 5
name: default
resourceVersion: "406630617"
uid: 037439ea-6989-48a6-87ff-662543500474
spec:
calicoNetwork:
bgp: Enabled
hostPorts: Enabled
ipPools:
- blockSize: 26
cidr: 172.18.0.0/16
disableBGPExport: false
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
linuxDataplane: Iptables
mtu: 9000
multiInterfaceMode: None
nodeAddressAutodetectionV4:
kubernetes: NodeInternalIP
cni:
ipam:
type: Calico
type: Calico
controlPlaneReplicas: 2
flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
kubeletVolumePluginPath: /var/lib/kubelet
nodeUpdateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
nonPrivileged: Disabled
variant: Calico
status:
computed:
calicoNetwork:
bgp: Enabled
hostPorts: Enabled
ipPools:
- blockSize: 26
cidr: 172.18.0.0/16
disableBGPExport: false
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
linuxDataplane: Iptables
mtu: 9000
multiInterfaceMode: None
nodeAddressAutodetectionV4:
kubernetes: NodeInternalIP
cni:
ipam:
type: Calico
type: Calico
controlPlaneReplicas: 2
flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
kubeletVolumePluginPath: /var/lib/kubelet
nodeUpdateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
nonPrivileged: Disabled
variant: Calico
conditions:
- lastTransitionTime: "2023-06-13T20:35:24Z"
message: All Objects Available
observedGeneration: 5
reason: AllObjectsAvailable
status: "False"
type: Progressing
- lastTransitionTime: "2023-06-13T20:35:24Z"
message: All Objects Available
observedGeneration: 5
reason: AllObjectsAvailable
status: "False"
type: Degraded
- lastTransitionTime: "2023-06-13T20:35:24Z"
message: All objects available
observedGeneration: 5
reason: AllObjectsAvailable
status: "True"
type: Ready
mtu: 9000
variant: Calico
@RichardSufliarsky could you delete the vxlanEnabled: true
line from your FelixConfiguration? Calico supports autodetecting the encapsulation from the IPPools (since I believe v3.23), and any value in FelixConfiguration overrides that. (I'm guessing you upgraded from a prior version that used to have to have those?)
Basically what's happening now is felix thinks VXLAN should be enabled, but there are no IP pools with VXLAN, removing that from FelixConfig will make felix stop expecting there to be a vxlan.calico
interface, which should fix the issue...
@coutinhop thanks for the response. I have changed CNI plugin together with k8s version upgrade last week so can't try that anymore. Closing the issue.
Expected Behavior
All the nodes of the k8s cluster should have vxlan.calico interface.
Current Behavior
We have 20 nodes bare-metal kubernetes cluster with Calico v3.25.0 and some of the nodes are missing vxlan.calico interface. Logs for the calico-node pods contain these errors repeated every second:
Can I try to set
vxlanEnabled: false
inFelixConfiguration
as we have all the nodes on the same subnet (172.16.2.0/24) to get rid of these messages?Context
IPv4 BGP status +--------------+-------------------+-------+------------+-------------+ | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | +--------------+-------------------+-------+------------+-------------+ | 172.16.2.30 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.60 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.80 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.90 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.91 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.92 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.93 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.12 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.13 | node-to-node mesh | up | 23:32:59 | Established | | 172.16.2.10 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.21 | node-to-node mesh | up | 22:05:51 | Established | | 172.16.2.22 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.23 | node-to-node mesh | up | 2023-05-24 | Established | | 172.16.2.24 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.25 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.26 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.27 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.28 | node-to-node mesh | up | 2023-05-19 | Established | | 172.16.2.50 | node-to-node mesh | up | 2023-05-19 | Established | +--------------+-------------------+-------+------------+-------------+
IPv6 BGP status No IPv6 peers found.
for s in k8s1 k8s2 k8s3 nas001 redis002 api001 gpu001 gpu002 gpu003 gpu004 gpu005 gpu006 node001 node002 node003 node004 node005 node006 node007 node008; do echo "Server ${s}: $(ssh ${s} 'ip a|grep vxlan')"; done Server k8s1: Server k8s2: 25: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default inet 172.18.142.64/32 scope global vxlan.calico Server k8s3: Server nas001: 39: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default inet 172.18.83.64/32 scope global vxlan.calico Server redis002: Server api001: 20: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default inet 172.18.243.64/32 scope global vxlan.calico Server gpu001: Server gpu002: 19: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default inet 172.18.170.128/32 scope global vxlan.calico Server gpu003: Server gpu004: 31: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default inet 172.18.168.128/32 scope global vxlan.calico Server gpu005: 39: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default inet 172.18.110.192/32 scope global vxlan.calico Server gpu006: 24: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default inet 172.18.87.64/32 scope global vxlan.calico Server node001: Server node002: Server node003: Server node004: Server node005: 63: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default inet 172.18.200.0/32 scope global vxlan.calico Server node006: 210: vxlan.calico: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default inet 172.18.167.64/32 scope global vxlan.calico Server node007: Server node008: