kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.92k stars 437 forks source link

release-1.12 linux vm live migration ping lost more than 10 seconds #3472

Closed bobz965 closed 2 months ago

bobz965 commented 9 months ago

Bug Report

release-1.12 vm live migration ping lost less than 3 seconds

Expected Behavior

release-1.12 vm live migration ping lost more than 10 seconds

Actual Behavior

Steps to Reproduce the Problem

1. 1. 1.

Additional Info

we have fixed this, pr will be later

github-actions[bot] commented 7 months ago

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

bobz965 commented 6 months ago

ref: https://www.openvswitch.org/support/ovscon2022/slides/Live-migration-with-OVN.pdf Live migration: reducing downtime with OVN multi chassis bindings

bobz965 commented 6 months ago

kubevirt 侧在热迁移过程中,需要暴露一些 annotation,以便让 kube-ovn 对 lsp 设置以及清理 migrator options。

kube-ovn 测:

bobz965 commented 6 months ago

默认 vpc 场景 测试结果


创建三个 pod 在不同的 node, 同时 ping 0.1 迁移状态中的虚拟机, 虚拟机热迁移三次,查看是否丢包

[root@euler-x86-70 pods]# kgp | grep ping
kube-system    euler-x86-70-pinger                                     1/1     Running             0          5s      10.222.0.43    euler-x86-70   <none>           <none>
kube-system    euler-x86-71-pinger                                     1/1     Running             0          5s      10.222.0.44    euler-x86-70   <none>           <none>
kube-system    euler-x86-73-pinger                                     1/1     Running             0          5s      10.222.0.45    euler-x86-70   <none>           <none>

[root@euler-x86-70 pods]# kgp | grep vm-m
zal            virt-launcher-zal-vm-m-b6pf6                            1/1     Running             0          138m    10.222.0.130   euler-x86-70   <none>           1/1

[root@euler-x86-70 ~]# kgp | grep vm-m
zal            virt-launcher-zal-vm-m-b6pf6                            0/1     Completed           0          147m    10.222.0.130   euler-x86-70   <none>           1/1
zal            virt-launcher-zal-vm-m-rp67c                            1/1     Running             0          2m15s   10.222.0.130   euler-x86-71   <none>           1/1
[root@euler-x86-70 ~]#
[root@euler-x86-70 ~]#
[root@euler-x86-70 ~]# k delete  po -n zal            virt-launcher-zal-vm-m-b6pf6
pod "virt-launcher-zal-vm-m-b6pf6" deleted        # 删除 complete 状态的虚拟机也会触发 options             : {} 的 重置,这个操作一般都是人为手动删的。所以不会影响
[root@euler-x86-70 ~]#
[root@euler-x86-70 ~]#
[root@euler-x86-70 ~]#  kgp | grep vm-m
zal            virt-launcher-zal-vm-m-rp67c                            1/1     Running             0          2m35s   10.222.0.130   euler-x86-71   <none>           1/1
[root@euler-x86-70 ~]#

#### 第一次测试,丢包持续 <= 0.5s

3755 packets transmitted, 3750 packets received, 0% packet loss  # 丢 5个包
round-trip min/avg/max/stddev = 0.094/0.322/2.159/0.171 ms
root@euler-x86-73-pinger:/kube-ovn#

^C--- 10.222.0.130 ping statistics ---
4710 packets transmitted, 4706 packets received, 0% packet loss # 丢 4个包
round-trip min/avg/max/stddev = 0.104/0.311/21.394/0.346 ms
root@euler-x86-70-pinger:/kube-ovn#

^C--- 10.222.0.130 ping statistics ---
4462 packets transmitted, 4458 packets received, 0% packet loss # 丢4个包
round-trip min/avg/max/stddev = 0.117/0.347/21.338/0.361 ms
root@euler-x86-71-pinger:/kube-ovn#

#### 第二次测试,丢包持续 <= 0.8s

^C--- 10.222.0.130 ping statistics ---
735 packets transmitted, 728 packets received, +22 duplicates, 0% packet loss # 丢 7个包
round-trip min/avg/max/stddev = 0.037/0.463/7.571/1.040 ms
root@euler-x86-73-pinger:/kube-ovn#

^C--- 10.222.0.130 ping statistics ---
699 packets transmitted, 691 packets received, +21 duplicates, 1% packet loss # 丢 8个包
round-trip min/avg/max/stddev = 0.038/0.497/7.467/1.059 ms
root@euler-x86-70-pinger:/kube-ovn#

^C--- 10.222.0.130 ping statistics ---
664 packets transmitted, 656 packets received, +21 duplicates, 1% packet loss # 丢 8个包
round-trip min/avg/max/stddev = 0.038/0.488/7.227/1.076 ms
root@euler-x86-71-pinger:/kube-ovn#

#### 第三次测试,丢包持续 <= 0.5s

^C--- 10.222.0.130 ping statistics ---
1917 packets transmitted, 1912 packets received, 0% packet loss # 丢5个包
round-trip min/avg/max/stddev = 0.105/0.350/2.434/0.180 ms
root@euler-x86-73-pinger:/kube-ovn#

^C--- 10.222.0.130 ping statistics ---
1960 packets transmitted, 1955 packets received, 0% packet loss # 丢5个包
round-trip min/avg/max/stddev = 0.095/0.249/1.495/0.133 ms
root@euler-x86-70-pinger:/kube-ovn#

^C--- 10.222.0.130 ping statistics ---
1920 packets transmitted, 1915 packets received, 0% packet loss # 丢5个包
round-trip min/avg/max/stddev = 0.066/0.287/2.390/0.201 ms
root@euler-x86-71-pinger:/kube-ovn#

#### 连续切换5次,平均 丢包持续 <= 0.5s

^C--- 10.222.0.130 ping statistics ---
4917 packets transmitted, 4892 packets received, 0% packet loss # 丢25个包
round-trip min/avg/max/stddev = 0.086/0.438/109.496/2.169 ms
root@euler-x86-73-pinger:/kube-ovn#

^C--- 10.222.0.130 ping statistics ---
4908 packets transmitted, 4883 packets received, 0% packet loss # 丢25个包
round-trip min/avg/max/stddev = 0.085/0.450/109.385/2.166 ms
root@euler-x86-70-pinger:/kube-ovn#

^C--- 10.222.0.130 ping statistics ---
4899 packets transmitted, 4874 packets received, 0% packet loss # 丢25个包
round-trip min/avg/max/stddev = 0.082/0.460/109.474/2.172 ms
root@euler-x86-71-pinger:/kube-ovn#
bobz965 commented 6 months ago

vlan 场景测试结果


#### 虚拟机 ip

[root@zal-vm-m ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:00:00:86:a7:52 brd ff:ff:ff:ff:ff:ff
    inet 100.71.45.70/26 brd 100.71.45.127 scope global dynamic noprefixroute eth0
       valid_lft 86313330sec preferred_lft 86313330sec
    inet6 fe80::200:ff:fe86:a752/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
[root@zal-vm-m ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
100.71.45.64    0.0.0.0         255.255.255.192 U     100    0        0 eth0
[root@zal-vm-m ~]# ip route add default via 100.71.45.126
[root@zal-vm-m ~]#
[root@zal-vm-m ~]#
[root@zal-vm-m ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         100.71.45.126   0.0.0.0         UG    0      0        0 eth0
100.71.45.64    0.0.0.0         255.255.255.192 U     100    0        0 eth0
[root@zal-vm-m ~]#
[root@zal-vm-m ~]#
[root@zal-vm-m ~]#
[root@zal-vm-m ~]# [root@euler-x86-70 vlan]#
[root@euler-x86-70 vlan]#
[root@euler-x86-70 vlan]#
[root@euler-x86-70 vlan]# ping 100.71.45.70
PING 100.71.45.70 (100.71.45.70) 56(84) bytes of data.
64 bytes from 100.71.45.70: icmp_seq=1 ttl=63 time=0.863 ms

#### pinger 

[root@euler-x86-70 vlan]# kgp| grep pinger
kube-system    euler-x86-70-pinger                                     1/1     Running             0          25m     100.71.45.65   euler-x86-70   <none>           <none>
kube-system    euler-x86-71-pinger                                     1/1     Running             0          25m     100.71.45.66   euler-x86-70   <none>           <none>
kube-system    euler-x86-73-pinger                                     1/1     Running             0          25m     100.71.45.67   euler-x86-70   <none>           <none>

#### 第一次测试,0.1s 间隔 ping, 0丢包

^C--- 100.71.45.71 ping statistics ---
2178 packets transmitted, 2178 packets received, +1 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.117/2.328/0.109 ms
root@euler-x86-70-pinger:/kube-ovn#

^C--- 100.71.45.71 ping statistics ---
2137 packets transmitted, 2137 packets received, +1 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.117/2.432/0.109 ms
root@euler-x86-71-pinger:/kube-ovn#

^C--- 100.71.45.71 ping statistics ---
1958 packets transmitted, 1958 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.113/2.082/0.108 ms
root@euler-x86-73-pinger:/kube-ovn#

#### 第二次测试,0.1s 间隔 ping, 0丢包

^C--- 100.71.45.71 ping statistics ---
995 packets transmitted, 995 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.038/0.208/5.550/0.502 ms
root@euler-x86-70-pinger:/kube-ovn#

^C--- 100.71.45.71 ping statistics ---
1035 packets transmitted, 1035 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.038/0.222/5.539/0.533 ms
root@euler-x86-71-pinger:/kube-ovn#

^C--- 100.71.45.71 ping statistics ---
1093 packets transmitted, 1093 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.039/0.217/5.589/0.522 ms
root@euler-x86-73-pinger:/kube-ovn#

#### 第三次测试,0.1s 间隔 ping, 0丢包
^C--- 100.71.45.71 ping statistics ---
1535 packets transmitted, 1535 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.268/85.838/2.559 ms
root@euler-x86-70-pinger:/kube-ovn#

^C--- 100.71.45.71 ping statistics ---
1527 packets transmitted, 1527 packets received, +1 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.023/0.298/80.868/2.585 ms
root@euler-x86-71-pinger:/kube-ovn#

^C--- 100.71.45.71 ping statistics ---
1535 packets transmitted, 1535 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.022/0.182/32.376/0.830 ms
root@euler-x86-73-pinger:/kube-ovn#

#### 全新连续切换5次,0.1s 间隔 ping, 0丢包, 平均 dup <=0.5ms

^C--- 100.71.45.71 ping statistics ---
4866 packets transmitted, 4866 packets received, +25 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.036/0.230/5.397/0.375 ms
root@euler-x86-70-pinger:/kube-ovn#

^C--- 100.71.45.71 ping statistics ---
4861 packets transmitted, 4861 packets received, +24 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.037/0.243/9.979/0.414 ms
root@euler-x86-71-pinger:/kube-ovn#

^C--- 100.71.45.71 ping statistics ---
4859 packets transmitted, 4859 packets received, +25 duplicates, 0% packet loss
round-trip min/avg/max/stddev = 0.035/0.231/4.682/0.371 ms
root@euler-x86-73-pinger:/kube-ovn#

vlan 场景下丢包率更低

anyfeel commented 4 months ago

pr 3767 has add support of ovn lsp migration options settings,but it only work for pod with MigrationSourceAnnotation and MigrationTargetAnnotation both set, which is kubevirt's duty. but i have not found MigrationSourceAnnotation been set in kubevrit latest source code(main branch with 055c6e0491fa93befa6372ca4d367916cabcb5af), how the upper test done?

bobz965 commented 4 months ago

@Longchuanzheng will you please commit the code to the kubevirt in kube-ovn, thanks!

Longchuanzheng commented 4 months ago

@Longchuanzheng will you please commit the code to the kubevirt in kube-ovn, thanks!

OK, I will upload the functional code first, although there are still some unit tests that are not completed. I will finish the rest as soon as possible.

Longchuanzheng commented 4 months ago

@bobz965, @anyfeel https://github.com/kubeovn/kubevirt-dpdk/pull/1