Open twofish197 opened 5 days ago
Attach the failed log for file ovs-vswitchd_failed_port_allocating.log
After the debugging on some failed windows vm. This issue should be an known issue which does have a fix via commit below.
So it is likely ovs-windows will block some port allocating to avoid some unrecoverble case.
netdev-windows: Add checking when creating netdev with system type on Windows https://github.com/openvswitch/ovs/commit/1cdc0529f742a03bc6ed615de897eb68cf140ac1
Quoting the bug description here, Some system type port will be created netdev successfully and it will cause conflict as in the dpif side it will be internal type. So finally the port will be created failed and it could not be easily recovered.
With the patch, on Windows the netdev creating will be blocked for system type when the ovs_type got on dpif is internal. More detailed case description is in the reported issue No.262 with link below. https://github.com/openvswitch/ovs-issues/issues/262
In current ovs windows logic, the failed port adding on ovs does needs the extra config change on ovsdb server. It may be checked if we could add some logic in ovs userspace to do the resyncing when ovs windows is blocking some port adding. It will be tracked by this upstream issue on ovs.
On Windows platform, we created Deployed Windows Large Cluster with 3 Ubuntu CP node and 100+ Windows Worker Nodes,
On Windows node ovs is to create one containerd pod on host to create vNIC, and set ports type to internal to support the connections between ovs and pods. It is found port creating error on some Windows node(2% -3%) during the test. Below is the output of CMD "ovs-vsctl show".
Bridge br-int datapath_type: system Port antrea-gw0 Interface antrea-gw0 type: internal Port antrea-tun0 Interface antrea-tun0 type: geneve options: {key=flow, local_ip="10.244.3.24", remote_ip=flow} Port eth0 Interface eth0 Port br-int Interface br-int type: internal Port vsphere--c546b0 Interface vsphere--c546b0 type: internal error: "could not add network device vsphere--c546b0 to ofproto (Invalid argument)"
could also fix this issue.