Open gugulee opened 2 months ago
Hi,
this sounds like something that might be addressed by https://patchwork.ozlabs.org/project/ovn/patch/20240826131509.202811-1-amusil@redhat.com/. If you have the chance would you mind trying this commit if it helps?
There are a total of 200 to 300 compute nodes online, with approximately 20,000 to 30,000 ports (mainly distributed across two logical switches). Each compute node has about 200,000 OVS flow entries. After the ovn-controller restarts, it gets stuck for a long time (about 8 minutes). during the flow table processing period, the network on this node is unavailable.
ovn-controller logs are as follows:
the
pstack
output is:It seem that the time is mainly consumed in the pflow_output_sb_port_binding_handler function.
I found that after the ovn-controller restarts, all OVS flow tables are cleared, causing network issues for existing ports. I tried using the
external_ids:ovn-ofctrl-wait-before-clear
option. Although it can prevent the flow tables from being cleared, during this period, the ports assigned to this node cannot come up, and the mac_binding table cannot be updated properly.In this scenario, how can it be optimized? Are there any other options I might have missed?