kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.87k stars 433 forks source link

[BUG] MTU mismatch issue with jumbo enabled !!! #4254

Open aravindgpd opened 2 days ago

aravindgpd commented 2 days ago

Kube-OVN Version

1.12.3

Kubernetes Version

1.24.17

Operation-system/Kernel Version

awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release "Ubuntu 20.04.6 LTS"

uname -r

5.4.0-187-generic

Description

  1. Facing an issue on worker nodes in multi-node kubernetes cluster ( 3 master and N worker nodes ) with Kube OVN 1.12.3 installed and all physical interfaces with jumbo frames enabled ( MTU size: 9000)
  2. Our observation is that after setting MTU : 9000 on physical interfaces and bond interfaces, CNI sets MTU size to 8900 on interface ovn0 & veth interfaces correctly. But we found that OVN pinger pods on few worker nodes fails with below errors:
kubectl logs kube-ovn-pinger-58qq5 -n kube-system
I0702 18:38:12.608346 1452014 pinger.go:20]
-----------------------------------------------------
Kube-OVN:
version:  v1.12.3
Build:  2023-11-06_07:31:19
Commit: git-b0efd5a
Go version: go1.21.3
Arch:  amd64
--------------------------------------------------
E0702 18:38:22.618191 1452014 config.go:160] failed to get self pod kube-system/kube-ovn-pinger-58qq5. Get "https://10.233.0.1:443/api/v1/namespaces/kube-system/pods/kube-ovn-pinger-58qq5?timeout=15s": net/http: TLS handshake timeout
E0702 18:38:22.619948 1452014 klog.go:10] "failed to parse config" err="Get "https://10.233.0.1:443/api/v1/namespaces/kube-system/pods/kube-ovn-pinger-58qq5?timeout=15s" net/http: TLS handshake timeout"
...
3. we did further troubleshoot here pinger pod and few other pods on worker nodes unable to reach API server due to MTU issue only. Since when we ping master node IP with MTU Size < 1400 it works but huge MTU size does not work at all:

# ping 100.64.0.8 -M do -s 8000
PING 100.64.0.8 ( 100.64.0.8) 8000(8028) bytes of data.
--- 100.64.0.8 ping statistics ---
88 packets transmitted 0 received , 100% packets loss, time 89074ms.
from master to worker's node IP:
ping 10.360.15 -M do -s 8000
PING 10.36.0.15 ( 10.36.0.15) 8000(8028) bytes of data.
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500

4. we are quite sure that the issue is not at switch and OS level, since when we removed worker node from the cluster, MTU issue does not replicate at all, But when we added the worker node back to the cluster, the issue got replicated after few hours.
5.  we tried changing tunnel type from 'geneve' to 'vxlan' as well since we thought that current kernel version may not be supporting fragmentation/defragmentation.
6.  we tried setting MTU to a bit lower value ( MTU size: 8500) in kube-ovn-cni daemonset but it didn't help.
7. Most of our physical server have 'Mellanox connect X-5 EN 25GBE Dual Port Adapter, Mellanox Connect X-6 EN 100GBE Dual Port Adapter, NVIDA Connect X-6 LX 2x 25GBE SFP28" interfaces.

Steps To Reproduce

details given in description

Current Behavior

  1. pods on worker nodes unable to reach kube-apiserver
  2. ping between worker node and master node on OVN0 interface with MTU size > 1500 not working even though MTU is set to 9000 at all level.
  3. ping between worker node and master node on Management interface ( on which node ip assigned) not working with MTU

Expected Behavior

  1. communication between the pods should work fine
  2. ping between master node and worker node should work fine
zhangzujian commented 2 days ago

Did you change MTU to 9000 after installing Kube-OVN?

nics90 commented 2 days ago

@zhangzujian : No, it was set before installing CNI only.

zhangzujian commented 2 days ago

I cannot reproduce it. Please check MTU settings.

nics90 commented 2 days ago

@zhangzujian : Already checked MTU values at interface/bond/CNI level. Is there anything else can be checked or ay temporary workaround to fix it ?

zhangzujian commented 2 days ago

Check MTU settings of all the interfaces on the node:

ip link | grep mtu

Check MTU of the pods:

kubectl -n kube-system exec -ti kube-ovn-pinger-7c9nj -- ip link
zhangzujian commented 2 days ago

On the worker nodes, check communication between the worker node and master nodes:

# replace 192.168.0.101 with the master node ip address
ping 192.168.0.101 -M do -s 8000
zhangzujian commented 2 days ago

And check communication between worker nodes:

# replace 192.168.0.102 with the ip address of another worker node
ping 192.168.0.102 -M do -s 8000
nics90 commented 2 days ago

ip link | grep mtu

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
2: eno8303: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
3: eno8403: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
4: eno12399: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
5: eno12409: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
6: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
7: ens3f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
8: ens6f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
9: ens6f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
10: ens5f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT group default qlen 1000
11: ens5f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
12: ens4f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
13: ens4f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT group default qlen 1000
14: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
15: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
16: bond1.1702@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
17: bond0.101@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
18: bond0.36@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
19: bond0.900@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
20: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default
21: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
22: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
23: mirror0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
24: ovn0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
25: br-int: <BROADCAST,MULTICAST> mtu 8900 qdisc noop state DOWN mode DEFAULT group default qlen 1000
27: 0ff5594a8d55_h@if26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
31: 7247f4135347_h@if30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
33: 9e0e4f7f568e_h@if32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
37: f0c772f38b20_h@if36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
39: 45731834c4d5_h@if38: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
43: b93ad2e4e064_h@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
45: ea23917090f6_h@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
55: 44de039fbc47_h@if54: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
95: fac465e4358c_h@if94: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
97: d72f3005c754_h@if96: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000

Since the impacted worker node kube ovn pinger is crashing unable to get into shell, but pasting output from other worker node where its working

kube-ovn-pinger-58qq5                              0/1     CrashLoopBackOff   190 (4m15s ago)   16h   10.233.64.209  

kubectl exec -it pod/kube-ovn-pinger-58qq5 -n kube-system -- bash
error: unable to upgrade connection: container not found ("pinger")

kubectl exec -it pod/kube-ovn-pinger-dmbw2 -n kube-system -- ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
116: eth0@if117: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UP mode DEFAULT group default
    link/ether 00:00:00:68:45:06 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    alias aa42f11ce211_c

ping output:

ping 10.36.0.15 -M do -s 8000
PING 10.36.0.15 (10.36.0.15) 8000(8028) bytes of data.
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
ping: local error: message too long, mtu=1500
^C
--- 10.36.0.15 ping statistics ---
5 packets transmitted, 0 received, +5 errors, 100% packet loss, time 4085ms
zhangzujian commented 2 days ago

from master to worker's node IP: ping 10.360.15 -M do -s 8000 PING 10.36.0.15 ( 10.36.0.15) 8000(8028) bytes of data. ping: local error: message too long, mtu=1500

Seems your MTU settings of master nodes are incorrect.

nics90 commented 2 days ago

Below is the MTU from master node:

ip link | grep mtu
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
2: eno8303: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
3: eno8403: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
4: eno12399: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
5: eno12409: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
6: ens3f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
7: ens3f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
8: ens6f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
9: ens6f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
10: ens5f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT group default qlen 1000
11: ens5f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
12: ens4f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
13: ens4f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP mode DEFAULT group default qlen 1000
14: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
15: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
16: bond1.1702@bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
17: bond0.36@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
18: bond0.900@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
19: bond0.101@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
20: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default
21: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
22: br-external: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
23: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
24: mirror0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
25: ovn0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
26: br-int: <BROADCAST,MULTICAST> mtu 8900 qdisc noop state DOWN mode DEFAULT group default qlen 1000
28: 56b3d90bd3a8_h@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
32: 29d92282cf2b_h@if31: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
36: 627c1696083e_h@if35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
38: b0215e1887ab_h@if37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
42: e95b3b8c2f67_h@if41: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
48: a3c012ebef96_h@if47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
50: 9ad68e24cc95_h@if49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
52: c79bf07a4b28_h@if51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
56: 72a975bfd99c_h@if55: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
58: 14b5bf2df05e_h@if57: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
68: bfea2bdc2841_h@if67: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
72: 23e67c193f29_h@if71: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
76: 1de99cb8674f_h@if75: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
106: 0e701d5c1454_h@if105: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
114: 2f3902c748dd_h@if113: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
116: cf148ab19def_h@if115: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
118: 2a26dba96266_h@if117: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
nics90 commented 17 hours ago

@zhangzujian : You mentioned earlier : zhangzujian commented 2 days ago Did you change MTU to 9000 after installing Kube-OVN?

Let's say if we change MTU above installing Kube OVN then what is the impact ?

Could you please let us know since it's impacting our production workloads.

On some another cluster with 3 master nodes today we changed netplan and added VLAN 100 on bond0 on which CNI is running and the MTU issue got replicated.

vlans:
    bond0.94:
      id: 94
      link: bond0
      mtu: 9000
      addresses:
      - 10.31.26.15/24
      gateway4: 10.31.26.245
      nameservers:
        addresses:
        - 8.8.8.8
        - 8.8.4.4
    bond0.400:
      id: 400
      link: bond0
      mtu: 9000
      addresses:
      - 192.168.65.1/24
    bond0.118:
      id: 118
      link: bond0
    bond0.119:
      id: 119
      link: bond0
    bond0.100:
      id: 100
      link: bond0
      mtu: 9000
zhangzujian commented 17 hours ago

from master to worker's node IP: ping 10.360.15 -M do -s 8000 PING 10.36.0.15 ( 10.36.0.15) 8000(8028) bytes of data. ping: local error: message too long, mtu=1500

You need to fix it.

nics90 commented 16 hours ago

@zhangzujian : This gets replicated only when kube ovn is there, we earlier removed the worker from the node and this ping with higher works smoothly which means when kube ovn takes over then only the issue comes in.

zhangzujian commented 16 hours ago

Are you using underlay or vpc nat gateway? Please use tracepath to find out which interface has the incorrect mtu.