kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.91k stars 435 forks source link

某个节点ovs报错,导致该节点网络无法使用 #1549

Closed wangyd1988 closed 1 year ago

wangyd1988 commented 2 years ago

使用kube-ovn版本

1.7.3

复现方法

偶然发现,未记录复现方法

日志如下

[root@ft-icks2 ovn]# kubectl -n kube-system logs -f ovs-ovn-vnl92 filename: /lib/modules/4.18.0-305.40.2.1.kux.aarch64/kernel/net/openvswitch/openvswitch.ko.xz alias: net-pf-16-proto-16-family-ovs_ct_limit alias: net-pf-16-proto-16-family-ovs_meter alias: net-pf-16-proto-16-family-ovs_packet alias: net-pf-16-proto-16-family-ovs_flow alias: net-pf-16-proto-16-family-ovs_vport alias: net-pf-16-proto-16-family-ovs_datapath license: GPL description: Open vSwitch switching datapath rhelversion: 8.4 srcversion: 43D9892E64943386EB1EC13 depends: nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6 intree: Y name: openvswitch vermagic: 4.18.0-305.40.2.1.kux.aarch64 SMP mod_unload modversions aarch64 sig_id: PKCS#7 signer: Inspur K-UX kernel signing key sig_key: 65:53:22:22:16:6C:D6:8B:58:7A:3F:5B:92:E8:FF:59:05:EB:0E:76 sig_hashalgo: sha256 signature: 1D:92:31:2E:EC:DE:BE:E2:E7:7C:53:F6:49:2C:59:3E:26:0A:4F:41: 9D:C1:FF:35:3F:D4:8A:10:E7:53:71:E0:FB:3F:BC:76:AA:73:D8:58: DA:15:7A:19:B5:B3:73:7F:BB:38:2B:01:1F:B3:90:A5:BA:81:A9:C2: E6:7C:44:09:72:2C:31:AE:B0:BD:6A:24:57:AB:17:47:B0:AD:4F:F5: 8F:C3:6F:FF:E6:A6:B5:C6:D2:8E:98:A4:6E:01:84:C0:DB:7F:42:16: C5:01:87:56:35:9E:BB:74:DF:B8:7E:BB:33:A7:20:7A:7A:1E:A3:B8: C1:80:BA:51:91:87:6A:4A:19:15:39:51:AB:A8:32:FA:EC:89:79:88: FC:BD:6D:A4:D2:A6:66:F2:DD:23:53:67:D9:C9:38:6A:5A:E3:4E:0D: 3E:79:03:E7:E7:EA:57:8A:D6:0C:AE:9C:5A:DE:B7:58:27:EE:6A:0E: 92:B4:CA:56:ED:7E:AC:61:4C:6B:38:8E:72:35:37:C6:4B:A1:4C:99: 93:6C:39:21:F0:66:38:E9:4D:9B:83:F8:38:17:D6:3E:05:85:B4:8B: 57:07:15:7F:00:20:FC:B2:48:79:B9:9A:40:AD:4B:53:02:C4:2E:15: C8:58:E5:C7:77:EF:83:BD:09:91:9F:FD:9B:9E:63:3A:A8:BB:D1:06: F6:A3:B8:C8:CF:54:A6:98:88:09:D4:5E:17:84:FE:CB:B7:67:BF:F1: E2:E9:2B:8F:B2:DE:2E:60:33:48:B4:E1:BA:A7:E3:BE:0A:7E:2A:E7: 5B:9D:68:BD:04:D3:9A:62:C5:87:1C:E6:D6:B9:91:B6:AE:73:DA:48: C0:DE:F9:5C:C0:25:08:1B:86:68:BF:30:4C:D0:ED:E6:2B:6D:A3:EE: D3:AA:E7:F9:E8:4A:4E:71:42:26:38:87:7B:0C:03:2D:89:CA:B7:C7: 23:D8:D2:6A:F3:E5:BD:32:57:83:EB:C2:51:DF:3C:4D:98:47:72:43: 16:63:38:64 filename: /lib/modules/4.18.0-305.40.2.1.kux.aarch64/kernel/drivers/net/geneve.ko.xz alias: rtnl-link-geneve description: Interface driver for GENEVE encapsulated traffic author: John W. Linville linville@tuxdriver.com version: 0.6 license: GPL rhelversion: 8.4 srcversion: 8880BCEBEC28109BFE14EA4 depends: udp_tunnel,ip6_udp_tunnel intree: Y name: geneve vermagic: 4.18.0-305.40.2.1.kux.aarch64 SMP mod_unload modversions aarch64 sig_id: PKCS#7 signer: Inspur K-UX kernel signing key sig_key: 65:53:22:22:16:6C:D6:8B:58:7A:3F:5B:92:E8:FF:59:05:EB:0E:76 sig_hashalgo: sha256 signature: 0F:01:1E:43:E9:0E:77:F2:C1:A3:96:AB:E8:4F:64:9C:49:15:7B:40: B0:FD:43:E3:C2:31:37:ED:20:22:D1:1D:D2:16:F3:40:5D:5D:DA:24: BE:F6:56:D2:56:81:B7:D2:8D:83:F5:46:14:C2:DC:F4:A1:D1:46:0B: 45:E9:2E:B5:EF:E1:DE:87:1E:73:66:3D:DE:02:67:D7:40:9D:26:0C: 8B:C8:A2:25:BC:76:34:2A:FF:27:41:10:E6:F4:59:C9:15:E2:A1:F0: BF:4D:5C:46:33:75:3C:5F:8B:95:25:92:12:FF:83:F7:0F:BF:6B:B3: EE:56:63:9B:CD:CA:94:81:27:49:5B:4E:82:51:9F:40:D6:AF:52:B2: 44:1A:5C:BA:2F:33:D5:05:C0:E6:00:A6:0F:F0:45:41:DD:2D:7D:C1: FB:BC:0C:C3:8F:6A:48:B1:B8:B3:94:82:47:EA:8B:2D:0B:92:FB:30: 2E:81:6C:17:4E:80:59:7F:FA:1E:E9:1E:4E:DE:AE:F9:0D:07:BB:93: 23:0E:8F:33:EC:43:D2:39:67:CA:72:A6:3D:BD:F5:29:6D:3C:7F:60: 8F:83:B7:91:37:D6:5F:56:5F:D3:E4:71:65:7C:66:4F:C2:E9:44:CE: 55:69:2C:4B:9C:08:BC:FE:C8:19:85:DD:C4:21:A5:00:FB:15:56:09: C0:26:B2:C4:8C:F5:FA:DF:C7:8B:0A:83:EE:E5:F5:2D:BB:FD:F1:09: EE:E2:E2:8C:7D:0A:01:0E:25:D2:7C:A1:F2:0F:0E:FA:41:10:D5:B1: A7:EC:29:D1:80:32:8B:EE:C0:32:6A:75:47:96:87:DB:6B:21:21:91: 55:BA:94:EC:05:F1:3E:6A:81:FE:E3:80:3C:27:6E:C1:B8:96:07:34: 23:3A:1C:21:0E:A4:8E:2F:FC:84:3B:9A:FF:FC:94:5A:30:61:B3:6F: 06:80:23:37:13:B1:16:CB:5D:1B:CC:E9:AA:65:81:78:F0:9F:22:BD: 73:A8:FB:85 parm: log_ecn_error:Log packets received with corrupted ECN (bool) 2022-05-18T09:04:09Z|00001|unixctl|WARN|failed to connect to /var/run/openvswitch/ovsdb-server.83903.ctl ovs-appctl: cannot connect to "/var/run/openvswitch/ovsdb-server.83903.ctl" (No such file or directory) 2022-05-18T09:04:09Z|00001|unixctl|WARN|failed to connect to /var/run/openvswitch/ovsdb-server.83903.ctl ovs-appctl: cannot connect to "/var/run/openvswitch/ovsdb-server.83903.ctl" (No such file or directory)

手动临时解决

删掉 /var/run/ovn/ovn-controller.xxx.ctl ,/var/run/openvswitch/ovsdb-server.xxx.ctl 以及对应pid文件依赖报错。怀疑pinger没有释放连接文件,然后删掉pinger后,ovs正常启动。

请帮忙定位原因,谢谢

oilbeater commented 2 years ago

看一下dmesg 是不是内存不足导致 oom

wangyd1988 commented 2 years ago

环境已经reset了,再出现这个问题,我们看下内存

oilbeater commented 1 year ago

Close stale issues.