openvswitch / ovs-issues

Issue tracker repo for Open vSwitch
10 stars 3 forks source link

kube-ovn Mixed x86 and aarch64 architecture, ovs-ovn pod restart aarch64 is more frequent than x86 #293

Closed zhaiyj closed 11 months ago

zhaiyj commented 11 months ago

restart log ovsdb.log 2023-08-01T18:00:45.709Z|00003|vlog(monitor)|INFO|opened log file /var/log/openvswitch/ovsdb-server.log 2023-08-01T18:00:45.709Z|00004|daemon_unix(monitor)|INFO|pid 19057 died, exit status 0, exiting 2023-08-01T18:00:45.818Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovsdb-server.log 2023-08-01T18:00:45.830Z|00002|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 2.14.4 2023-08-01T18:00:45.834Z|00003|jsonrpc|WARN|unix#0: receive error: Connection reset by peer 2023-08-01T18:00:45.834Z|00004|reconnect|WARN|unix#0: connection dropped (Connection reset by peer) 2023-08-01T18:00:45.851Z|00005|jsonrpc|WARN|unix#4: receive error: Connection reset by peer 2023-08-01T18:00:45.851Z|00006|reconnect|WARN|unix#4: connection dropped (Connection reset by peer) 2023-08-01T18:00:45.854Z|00007|jsonrpc|WARN|unix#5: receive error: Connection reset by peer 2023-08-01T18:00:45.854Z|00008|reconnect|WARN|unix#5: connection dropped (Connection reset by peer) 2023-08-01T18:00:45.858Z|00009|jsonrpc|WARN|unix#6: receive error: Connection reset by peer 2023-08-01T18:00:45.858Z|00010|reconnect|WARN|unix#6: connection dropped (Connection reset by peer)

ovs-vswtichd log 2023-08-01T18:00:01.736Z|03097|rconn|DBG|br-int<->unix#1: idle 60 seconds, sending inactivity probe 2023-08-01T18:00:01.737Z|03098|rconn|DBG|br-int<->unix#1: entering IDLE 2023-08-01T18:00:01.737Z|03099|rconn|DBG|br-int<->unix#1: entering ACTIVE 2023-08-01T18:00:05.501Z|03100|rconn|DBG|br-int<->unix#0: idle 60 seconds, sending inactivity probe 2023-08-01T18:00:05.501Z|03101|rconn|DBG|br-int<->unix#0: entering IDLE 2023-08-01T18:00:05.502Z|03102|rconn|DBG|br-int<->unix#0: entering ACTIVE 2023-08-01T18:00:28.503Z|00506|ofproto_dpif_xlate(handler2)|WARN|Dropped 1 log messages in last 30 seconds (most recently, 30 seconds ago) due to excessive rate 2023-08-01T18:00:28.503Z|00507|ofproto_dpif_xlate(handler2)|WARN|dropping packet received on port mirror0, which is reserved exclusively for mirroring on bridge br-int while processing in_port=6,vlan_tci=0x0000,dl_src=7e:65:0a:38:ec:23,dl_dst=01:80:c2:00:00:0e,dl_type=0x88cc 2023-08-01T18:00:45.706Z|03103|jsonrpc|WARN|unix:/var/run/openvswitch/db.sock: receive error: Connection reset by peer 2023-08-01T18:00:45.707Z|03104|reconnect|WARN|unix:/var/run/openvswitch/db.sock: connection dropped (Connection reset by peer) 2023-08-01T18:00:45.707Z|03105|reconnect|DBG|unix:/var/run/openvswitch/db.sock: entering BACKOFF 2023-08-01T18:00:45.896Z|03106|rconn|DBG|br-int<->unix#6: entering CONNECTING 2023-08-01T18:00:45.896Z|03107|rconn|DBG|br-int<->unix#6: connected 2023-08-01T18:00:45.896Z|03108|rconn|DBG|br-int<->unix#6: entering ACTIVE 2023-08-01T18:00:45.896Z|03109|rconn|DBG|br-int<->unix#6: connection closed by peer 2023-08-01T18:00:45.896Z|03110|rconn|DBG|br-int<->unix#6: entering DISCONNECTED 2023-08-01T18:00:45.898Z|03111|rconn|DBG|br-int<->unix#7: entering CONNECTING 2023-08-01T18:00:45.898Z|03112|rconn|DBG|br-int<->unix#7: connected 2023-08-01T18:00:45.898Z|03113|rconn|DBG|br-int<->unix#7: entering ACTIVE 2023-08-01T18:00:45.898Z|03114|rconn|DBG|br-int<->unix#7: connection closed by peer 2023-08-01T18:00:45.898Z|03115|rconn|DBG|br-int<->unix#7: entering DISCONNECTED 2023-08-01T18:00:45.900Z|03116|rconn|DBG|br-int<->unix#8: entering CONNECTING 2023-08-01T18:00:45.900Z|03117|rconn|DBG|br-int<->unix#8: connected 2023-08-01T18:00:45.900Z|03118|rconn|DBG|br-int<->unix#8: entering ACTIVE 2023-08-01T18:00:45.949Z|03119|rconn|DBG|br-int<->unix#8: connection closed by peer 2023-08-01T18:00:45.949Z|03120|rconn|DBG|br-int<->unix#8: entering DISCONNECTED 2023-08-01T18:00:45.955Z|03121|rconn|DBG|br-public-net<->unix#9: entering CONNECTING 2023-08-01T18:00:45.955Z|03122|rconn|DBG|br-public-net<->unix#9: connected 2023-08-01T18:00:45.955Z|03123|rconn|DBG|br-public-net<->unix#9: entering ACTIVE 2023-08-01T18:00:45.955Z|03124|rconn|DBG|br-public-net<->unix#9: connection closed by peer 2023-08-01T18:00:45.955Z|03125|rconn|DBG|br-public-net<->unix#9: entering DISCONNECTED 2023-08-01T18:00:45.957Z|03126|rconn|DBG|br-public-net<->unix#10: entering CONNECTING 2023-08-01T18:00:45.957Z|03127|rconn|DBG|br-public-net<->unix#10: connected 2023-08-01T18:00:45.957Z|03128|rconn|DBG|br-public-net<->unix#10: entering ACTIVE 2023-08-01T18:00:45.957Z|03129|rconn|DBG|br-public-net<->unix#10: connection closed by peer 2023-08-01T18:00:45.957Z|03130|rconn|DBG|br-public-net<->unix#10: entering DISCONNECTED 2023-08-01T18:00:45.959Z|03131|rconn|DBG|br-public-net<->unix#11: entering CONNECTING 2023-08-01T18:00:45.959Z|03132|rconn|DBG|br-public-net<->unix#11: connected 2023-08-01T18:00:45.959Z|03133|rconn|DBG|br-public-net<->unix#11: entering ACTIVE 2023-08-01T18:00:45.959Z|03134|rconn|DBG|br-public-net<->unix#11: connection closed by peer 2023-08-01T18:00:45.960Z|03135|rconn|DBG|br-public-net<->unix#11: entering DISCONNECTED 2023-08-01T18:00:45.965Z|03136|rconn|DBG|br-trunk-net<->unix#12: entering CONNECTING 2023-08-01T18:00:45.966Z|03137|rconn|DBG|br-trunk-net<->unix#12: connected 2023-08-01T18:00:45.966Z|03138|rconn|DBG|br-trunk-net<->unix#12: entering ACTIVE 2023-08-01T18:00:45.966Z|03139|rconn|DBG|br-trunk-net<->unix#12: connection closed by peer 2023-08-01T18:00:45.966Z|03140|rconn|DBG|br-trunk-net<->unix#12: entering DISCONNECTED 2023-08-01T18:00:45.968Z|03141|rconn|DBG|br-trunk-net<->unix#13: entering CONNECTING 2023-08-01T18:00:45.968Z|03142|rconn|DBG|br-trunk-net<->unix#13: connected 2023-08-01T18:00:45.968Z|03143|rconn|DBG|br-trunk-net<->unix#13: entering ACTIVE 2023-08-01T18:00:45.968Z|03144|rconn|DBG|br-trunk-net<->unix#13: connection closed by peer 2023-08-01T18:00:45.968Z|03145|rconn|DBG|br-trunk-net<->unix#13: entering DISCONNECTED 2023-08-01T18:00:45.970Z|03146|rconn|DBG|br-trunk-net<->unix#14: entering CONNECTING 2023-08-01T18:00:45.970Z|03147|rconn|DBG|br-trunk-net<->unix#14: connected 2023-08-01T18:00:45.970Z|03148|rconn|DBG|br-trunk-net<->unix#14: entering ACTIVE 2023-08-01T18:00:45.970Z|03149|rconn|DBG|br-trunk-net<->unix#14: connection closed by peer 2023-08-01T18:00:45.970Z|03150|rconn|DBG|br-trunk-net<->unix#14: entering DISCONNECTED 2023-08-01T18:00:45.977Z|03151|bridge|INFO|bridge br-public-net: deleted interface veth-br0-31 on port 1 2023-08-01T18:00:45.977Z|03152|bridge|INFO|bridge br-public-net: deleted interface br-public-net on port 65534 2023-08-01T18:00:45.978Z|03153|bridge|INFO|bridge br-trunk-net: deleted interface br-trunk-net on port 65534 2023-08-01T18:00:45.978Z|03154|bridge|INFO|bridge br-trunk-net: deleted interface veth-trunk-31 on port 1 2023-08-01T18:00:45.978Z|03155|bridge|INFO|bridge br-int: deleted interface ovn-7e526e-0 on port 89

ovn-controller.log

2023-08-01T18:00:45.707Z|00080|jsonrpc|WARN|unix:/var/run/openvswitch/db.sock: receive error: Connection reset by peer 2023-08-01T18:00:45.707Z|00081|reconnect|WARN|unix:/var/run/openvswitch/db.sock: connection dropped (Connection reset by peer) 2023-08-01T18:00:46.003Z|00004|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connection closed by peer 2023-08-01T18:00:46.003Z|00082|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connection closed by peer 2023-08-01T18:00:46.510Z|00003|vlog(monitor)|INFO|opened log file /var/log/ovn/ovn-controller.log 2023-08-01T18:00:46.510Z|00004|daemon_unix(monitor)|INFO|pid 19305 died, exit status 0, exiting 2023-08-01T18:00:46.580Z|00001|vlog|INFO|opened log file /var/log/ovn/ovn-controller.log 2023-08-01T18:00:46.582Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2023-08-01T18:00:46.582Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2023-08-01T18:00:46.591Z|00004|main|INFO|OVS IDL reconnected, force recompute. 2023-08-01T18:00:46.591Z|00005|reconnect|INFO|tcp:[192.168.211.142]:6642: connecting... 2023-08-01T18:00:46.591Z|00006|main|INFO|OVNSB IDL reconnected, force recompute. 2023-08-01T18:00:46.591Z|00007|reconnect|INFO|tcp:[192.168.211.142]:6642: connected 2023-08-01T18:00:46.639Z|00008|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2023-08-01T18:00:46.639Z|00009|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2023-08-01T18:00:46.640Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2023-08-01T18:00:46.644Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2023-08-01T18:00:46.644Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2023-08-01T18:00:46.650Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected

zhaiyj commented 11 months ago

image ovsdb pid 19057 died, exit status 0, exiting ovs-ovn pod exit code 255 unknow reason

igsilya commented 11 months ago

ovsdb pid 19057 died, exit status 0, exiting means the process exited cleanly when it was asked to exit. I don't see any OVS issues in this report.

Also, OVS 2.14 is EOL for a long time now. You should consider upgrading to at least OVS 2.17 LTS.

zhaiyj commented 11 months ago

Why does the ovs-ovn pod not restart on x68, but restarts frequently on aarch64. there are no errors in the exit and restart logs.

zhaiyj commented 11 months ago

9057 died, exit status 0, exiting

ovsdb pid 19057 died, exit status 0, exiting . because start-ovs.sh When the script exits, capture the EXIT signal and execute ovs-ctl stop. start-ovs.sh main script code: /usr/share/openvswitch/scripts/ovs-ctl restart --no-ovs-vswitchd --system-id=random /usr/share/openvswitch/scripts/ovs-ctl start --no-ovsdb-server --system-id=random /usr/share/openvswitch/scripts/ovs-ctl --protocol=udp --dport=6081 enable-protocol /usr/share/ovn/scripts/ovn-ctl restart_controller chmod 600 /etc/openvswitch/* tail -f /var/log/ovn/ovn-controller.log

function quit { /usr/share/ovn/scripts/grace_stop_ovn_controller /usr/share/openvswitch/scripts/ovs-ctl stop exit 0 } trap quit EXIT

igsilya commented 11 months ago

Why does the ovs-ovn pod not restart on x68, but restarts frequently on aarch64. there are no errors in the exit and restart logs.

From the provided logs, I see no issues indicating that there is some problem with OVS. I have no experience with kube-ovn project, so can't help with that. Please, report the issue to the kube-ovn project instead.