Closed legitYosal closed 4 months ago
CC-ing some random people that might know more about neutron (@booxter @cubeek @danalsan); please feel free to add more if appropriate
What you might want to set instead is: external_ids:ovn-ofctrl-wait-before-clear=<max time before reinstalling ovs flows
:
https://github.com/ovn-org/ovn/blob/47915c4c517c634dec919cfd60295db0d0bedfa7/controller/ovn-controller.8.xml#L289-L321
As for the explanation, this is just a guess, but I'm assuming in your setup ovn-controller is taking quite long to process the SB database contents so there's a window between the initial OVS flow clear and installing new flows that causes the downtime you're experiencing.
We can't tell without more info:
Hope this helps, Dumitru
To add to what @dceara said, it could help if you clarify what exactly is meant by "network shortage". Some services are provided by ovn-controller controller()
action handlers and it's expected that these flows are going to be disrupted during ovn-controller PID restart. These services include IPv6 ND, LB health checks... You can check for more examples in pinctrl.c
in ovn repo.
But if you experience complete break down of connectivity and not just for specific services, then it's probably what Dumitru suggested.
FYI for Red Hat OpenStack, we set ovn-ofctrl-wait-before-clear to 8000 (8s) but allow the knob to be tweaked for larger environments if needed.
Thank you @dceara , We are using ovn version 22.03 built with ovs 2.17, I am testing on a stage environment deployed on bare metal and I have over-loaded the computes with VMs and therefor sb db is heavy and pulling flows and recompute takes long(about 3 seconds). interestingly if I stop ovn_controller nothing will impact traffic flow, when ovn controller starts, I think it tries to delete all the flows and re-install them, due to excessive logs with verbose mode(1 million lines in 2 3 minutes) I am sending normal logs sectioned into segments in which I lose connection:
### ========> Restart initiated
2024-05-21T13:54:17.724Z|00001|vlog|INFO|opened log file /var/log/kolla/openvswitch/ovn-controller.log
2024-05-21T13:54:17.725Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2024-05-21T13:54:17.725Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2024-05-21T13:54:17.729Z|00004|main|INFO|OVN internal version is : [22.03.0-20.21.0-58.3]
2024-05-21T13:54:17.729Z|00005|main|INFO|OVS IDL reconnected, force recompute.
2024-05-21T13:54:17.729Z|00006|reconnect|INFO|tcp:172.25.0.1:6642: connecting...
2024-05-21T13:54:17.729Z|00007|main|INFO|OVNSB IDL reconnected, force recompute.
2024-05-21T13:54:17.729Z|00008|reconnect|INFO|tcp:172.25.0.1:6642: connected
2024-05-21T13:54:20.163Z|00009|features|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch
2024-05-21T13:54:20.164Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
2024-05-21T13:54:20.166Z|00011|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected
2024-05-21T13:54:20.166Z|00012|features|INFO|OVS Feature: ct_zero_snat, state: supported
2024-05-21T13:54:20.166Z|00013|main|INFO|OVS feature set changed, force recompute.
2024-05-21T13:54:20.166Z|00014|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch
2024-05-21T13:54:20.166Z|00015|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
2024-05-21T13:54:20.173Z|00016|main|INFO|OVS feature set changed, force recompute.
2024-05-21T13:54:20.173Z|00017|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected
2024-05-21T13:54:20.239Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch
2024-05-21T13:54:20.239Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
2024-05-21T13:54:20.282Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected
### ========> Connectivity cut completely
2024-05-21T13:54:27.728Z|00018|memory|INFO|142464 kB peak resident set size after 10.0 seconds
2024-05-21T13:54:27.728Z|00019|memory|INFO|idl-cells:75663 lflow-cache-entries-cache-expr:2 lflow-cache-entries-cache-matches:90 lflow-cache-size-KB:7 local_datapath_usage-KB:1 ofctrl_desired_flow_usage-KB:9170 ofctrl_installed_flow_usage-KB:6180 ofctrl_sb_flow_ref_usage-KB:4852
2024-05-21T13:54:30.433Z|00020|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:30.440Z|00021|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:30.446Z|00022|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:30.453Z|00023|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:30.460Z|00024|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:30.466Z|00025|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:30.473Z|00026|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:30.479Z|00027|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:30.486Z|00028|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:30.492Z|00029|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (100% CPU usage)
2024-05-21T13:54:40.546Z|00030|inc_proc_eng|INFO|node: logical_flow_output, handler for input SB_logical_flow took 4151ms
2024-05-21T13:54:40.904Z|00031|timeval|WARN|Unreasonably long 9607ms poll interval (9154ms user, 452ms system)
2024-05-21T13:54:40.905Z|00032|timeval|WARN|faults: 403581 minor, 0 major
2024-05-21T13:54:40.906Z|00033|timeval|WARN|disk: 0 reads, 16 writes
2024-05-21T13:54:40.907Z|00034|timeval|WARN|context switches: 0 voluntary, 34 involuntary
2024-05-21T13:54:40.916Z|00035|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=cf844ca2:
2024-05-21T13:54:40.917Z|00036|coverage|INFO|lflow_run 0.0/sec 0.033/sec 0.0006/sec total: 2
2024-05-21T13:54:40.918Z|00037|coverage|INFO|consider_logical_flow 0.0/sec 1.917/sec 0.0319/sec total: 268680
2024-05-21T13:54:40.919Z|00038|coverage|INFO|lflow_cache_add_expr 0.0/sec 0.033/sec 0.0006/sec total: 11927
2024-05-21T13:54:40.920Z|00039|coverage|INFO|lflow_cache_add_matches 0.0/sec 1.500/sec 0.0250/sec total: 11972
2024-05-21T13:54:40.921Z|00040|coverage|INFO|lflow_cache_add 0.0/sec 1.533/sec 0.0256/sec total: 23899
2024-05-21T13:54:40.922Z|00041|coverage|INFO|lflow_cache_hit 0.0/sec 6.167/sec 0.1028/sec total: 374
2024-05-21T13:54:40.923Z|00042|coverage|INFO|lflow_cache_miss 0.0/sec 3.233/sec 0.0539/sec total: 103372
2024-05-21T13:54:40.924Z|00043|coverage|INFO|lflow_conj_alloc 0.0/sec 0.133/sec 0.0022/sec total: 8
2024-05-21T13:54:40.925Z|00044|coverage|INFO|lflow_conj_free 0.0/sec 0.067/sec 0.0011/sec total: 4
2024-05-21T13:54:40.926Z|00045|coverage|INFO|physical_run 0.0/sec 0.050/sec 0.0008/sec total: 3
2024-05-21T13:54:40.927Z|00046|coverage|INFO|miniflow_malloc 0.0/sec 2392.133/sec 39.8689/sec total: 191752
2024-05-21T13:54:40.928Z|00047|coverage|INFO|hmap_pathological 2.8/sec 3.983/sec 0.0664/sec total: 713
2024-05-21T13:54:40.929Z|00048|coverage|INFO|hmap_expand 45756.2/sec 4204.700/sec 70.0783/sec total: 370540
2024-05-21T13:54:40.930Z|00049|coverage|INFO|txn_unchanged 170.6/sec 14.917/sec 0.2486/sec total: 1118
2024-05-21T13:54:40.931Z|00050|coverage|INFO|txn_incomplete 0.2/sec 0.067/sec 0.0011/sec total: 5
2024-05-21T13:54:40.932Z|00051|coverage|INFO|txn_success 0.2/sec 0.050/sec 0.0008/sec total: 3
2024-05-21T13:54:40.933Z|00052|coverage|INFO|poll_create_node 1630.4/sec 141.167/sec 2.3528/sec total: 9156
2024-05-21T13:54:40.933Z|00053|coverage|INFO|poll_zero_timeout 0.0/sec 0.083/sec 0.0014/sec total: 6
2024-05-21T13:54:40.933Z|00054|coverage|INFO|rconn_queued 0.0/sec 797.900/sec 13.2983/sec total: 71996
2024-05-21T13:54:40.933Z|00055|coverage|INFO|rconn_sent 0.0/sec 797.900/sec 13.2983/sec total: 71996
2024-05-21T13:54:40.933Z|00056|coverage|INFO|seq_change 644.4/sec 55.817/sec 0.9303/sec total: 3470
2024-05-21T13:54:40.933Z|00057|coverage|INFO|pstream_open 0.0/sec 0.017/sec 0.0003/sec total: 1
2024-05-21T13:54:40.933Z|00058|coverage|INFO|stream_open 0.0/sec 0.083/sec 0.0014/sec total: 5
2024-05-21T13:54:40.933Z|00059|coverage|INFO|util_xalloc 2113824.2/sec 215117.967/sec 3585.2994/sec total: 29013996
2024-05-21T13:54:40.933Z|00060|coverage|INFO|vconn_open 0.0/sec 0.050/sec 0.0008/sec total: 3
2024-05-21T13:54:40.933Z|00061|coverage|INFO|vconn_received 0.6/sec 0.183/sec 0.0031/sec total: 13
2024-05-21T13:54:40.933Z|00062|coverage|INFO|vconn_sent 0.0/sec 797.950/sec 13.2992/sec total: 71999
2024-05-21T13:54:40.933Z|00063|coverage|INFO|netlink_received 0.0/sec 0.383/sec 0.0064/sec total: 27
2024-05-21T13:54:40.933Z|00064|coverage|INFO|netlink_recv_jumbo 0.0/sec 0.100/sec 0.0017/sec total: 7
2024-05-21T13:54:40.933Z|00065|coverage|INFO|netlink_sent 0.0/sec 0.383/sec 0.0064/sec total: 27
2024-05-21T13:54:40.933Z|00066|coverage|INFO|cmap_expand 0.0/sec 0.050/sec 0.0008/sec total: 3
2024-05-21T13:54:40.933Z|00067|coverage|INFO|109 events never hit
2024-05-21T13:54:40.933Z|00068|poll_loop|INFO|Dropped 104 log messages in last 10 seconds (most recently, 9 seconds ago) due to excessive rate
2024-05-21T13:54:40.933Z|00069|poll_loop|INFO|wakeup due to [POLLIN] on fd 19 (<->/run/openvswitch/db.sock) at lib/stream-fd.c:157 (99% CPU usage)
2024-05-21T13:54:40.933Z|00070|memory|INFO|peak resident set size grew 609% in last 13.2 seconds, from 142464 kB to 1009804 kB
2024-05-21T13:54:40.934Z|00071|memory|INFO|idl-cells:3217711 idl-outstanding-txns:1 lflow-cache-entries-cache-expr:11927 lflow-cache-entries-cache-matches:11972 lflow-cache-size-KB:5338 local_datapath_usage-KB:1 ofctrl_desired_flow_usage-KB:17282 ofctrl_installed_flow_usage-KB:12785 ofctrl_sb_flow_ref_usage-KB:8049 oflow_update_usage-KB:1
### ========> Private network connectivity came back
2024-05-21T13:54:45.508Z|00072|inc_proc_eng|INFO|node: logical_flow_output, recompute ((null)) took 4383ms
2024-05-21T13:54:45.627Z|00073|timeval|WARN|Unreasonably long 4693ms poll interval (4587ms user, 105ms system)
2024-05-21T13:54:45.627Z|00074|timeval|WARN|faults: 74381 minor, 0 major
2024-05-21T13:54:45.627Z|00075|timeval|WARN|disk: 0 reads, 8 writes
2024-05-21T13:54:45.627Z|00076|timeval|WARN|context switches: 0 voluntary, 23 involuntary
2024-05-21T13:54:45.627Z|00077|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=07ea1ea1:
2024-05-21T13:54:45.627Z|00078|coverage|INFO|lflow_run 0.0/sec 0.033/sec 0.0006/sec total: 3
2024-05-21T13:54:45.627Z|00079|coverage|INFO|consider_logical_flow 53713.0/sec 4478.000/sec 74.6333/sec total: 537358
2024-05-21T13:54:45.627Z|00080|coverage|INFO|lflow_cache_add_expr 2385.0/sec 198.783/sec 3.3131/sec total: 11927
2024-05-21T13:54:45.627Z|00081|coverage|INFO|lflow_cache_add_matches 2376.4/sec 199.533/sec 3.3256/sec total: 11972
2024-05-21T13:54:45.627Z|00082|coverage|INFO|lflow_cache_add 4761.4/sec 398.317/sec 6.6386/sec total: 23899
2024-05-21T13:54:45.627Z|00083|coverage|INFO|lflow_cache_hit 0.8/sec 6.233/sec 0.1039/sec total: 24412
2024-05-21T13:54:45.627Z|00084|coverage|INFO|lflow_cache_miss 20635.6/sec 1722.867/sec 28.7144/sec total: 182794
2024-05-21T13:54:45.627Z|00085|coverage|INFO|lflow_conj_alloc 0.0/sec 0.133/sec 0.0022/sec total: 12
2024-05-21T13:54:45.627Z|00086|coverage|INFO|lflow_conj_free 0.0/sec 0.067/sec 0.0011/sec total: 4
2024-05-21T13:54:45.627Z|00087|coverage|INFO|physical_run 0.0/sec 0.050/sec 0.0008/sec total: 4
2024-05-21T13:54:45.627Z|00088|coverage|INFO|miniflow_malloc 9644.8/sec 3195.867/sec 53.2644/sec total: 263741
2024-05-21T13:54:45.627Z|00089|coverage|INFO|hmap_pathological 94.8/sec 11.883/sec 0.1981/sec total: 1171
2024-05-21T13:54:45.627Z|00090|coverage|INFO|hmap_expand 23651.6/sec 6175.667/sec 102.9278/sec total: 414318
2024-05-21T13:54:45.628Z|00091|coverage|INFO|txn_unchanged 44.6/sec 18.633/sec 0.3106/sec total: 1120
2024-05-21T13:54:45.628Z|00092|coverage|INFO|txn_incomplete 0.2/sec 0.083/sec 0.0014/sec total: 5
2024-05-21T13:54:45.628Z|00093|coverage|INFO|txn_success 0.0/sec 0.050/sec 0.0008/sec total: 4
2024-05-21T13:54:45.628Z|00094|coverage|INFO|poll_create_node 139.6/sec 152.800/sec 2.5467/sec total: 9192
2024-05-21T13:54:45.628Z|00095|coverage|INFO|poll_zero_timeout 0.2/sec 0.100/sec 0.0017/sec total: 6
2024-05-21T13:54:45.628Z|00096|coverage|INFO|rconn_queued 4824.6/sec 1199.950/sec 19.9992/sec total: 72020
2024-05-21T13:54:45.628Z|00097|coverage|INFO|rconn_sent 4824.6/sec 1199.950/sec 19.9992/sec total: 72020
2024-05-21T13:54:45.628Z|00098|coverage|INFO|seq_change 24.8/sec 57.883/sec 0.9647/sec total: 3485
2024-05-21T13:54:45.628Z|00099|coverage|INFO|pstream_open 0.0/sec 0.017/sec 0.0003/sec total: 1
2024-05-21T13:54:45.628Z|00100|coverage|INFO|stream_open 0.0/sec 0.083/sec 0.0014/sec total: 5
2024-05-21T13:54:45.628Z|00101|coverage|INFO|util_xalloc 3221389.6/sec 483567.100/sec 8059.4517/sec total: 33771677
2024-05-21T13:54:45.628Z|00102|coverage|INFO|vconn_open 0.0/sec 0.050/sec 0.0008/sec total: 3
2024-05-21T13:54:45.628Z|00103|coverage|INFO|vconn_received 0.8/sec 0.250/sec 0.0042/sec total: 18
2024-05-21T13:54:45.628Z|00104|coverage|INFO|vconn_sent 4824.6/sec 1200.000/sec 20.0000/sec total: 72023
2024-05-21T13:54:45.628Z|00105|coverage|INFO|netlink_received 0.8/sec 0.450/sec 0.0075/sec total: 31
2024-05-21T13:54:45.628Z|00106|coverage|INFO|netlink_recv_jumbo 0.2/sec 0.117/sec 0.0019/sec total: 8
2024-05-21T13:54:45.628Z|00107|coverage|INFO|netlink_sent 0.8/sec 0.450/sec 0.0075/sec total: 31
2024-05-21T13:54:45.628Z|00108|coverage|INFO|cmap_expand 0.0/sec 0.050/sec 0.0008/sec total: 3
2024-05-21T13:54:45.628Z|00109|coverage|INFO|109 events never hit
2024-05-21T13:54:45.628Z|00110|poll_loop|INFO|Dropped 1 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate
2024-05-21T13:54:45.628Z|00111|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (172.25.2.3:33726<->172.25.0.1:6642) at lib/stream-fd.c:157 (102% CPU usage)
### ========> Connected every where
Testing with ovs-vsctl set open . external_ids:ovn-ofctrl-wait-before-clear=20000
did not result in any change as it seems ovn_controller straightly goes to purge mode when started!
@booxter Also my connectivity test is:
when ovn controller purges flows, ping to private interface freezes, and ping to public one timeouts:
64 bytes from x.x.x.x: icmp_seq=47061 ttl=54 time=9.041 ms
64 bytes from x.x.x.x: icmp_seq=47062 ttl=54 time=8.427 ms
Request timeout for icmp_seq 47063
Request timeout for icmp_seq 47064
Request timeout for icmp_seq 47065
.
.
.
Request timeout for icmp_seq 47260
Request timeout for icmp_seq 47261
Request timeout for icmp_seq 47262
64 bytes from x.x.x.x: icmp_seq=47263 ttl=54 time=11.204 ms
64 bytes from x.x.x.x: icmp_seq=47264 ttl=54 time=8.241 ms
From reading ovn-ofctrl-wait-before-clear
multiple times I think it is saying, setting this option will prevent purging flows before recompute, but because the number of flows are too much, even purging and reinstalling them takes too long:
(ovn-controller)[root@stg1-compute2003 /]# ovs-appctl -t /var/run/openvswitch/ovs-vswitchd.16.ctl bridge/dump-flows br-int | wc -l
71977
Ok this is stage but on production we have hosts almost 80K flows, totally around 200K flows on southbound
So the problem here will not be solvable? could we tell ovn-controller to not purge the flows?
In order to solve your issue, ovn-controller should first get the dump of installed flows from ovs-vswitchd on startup and then sync the flows i.e delete or add only the required flows. This is possible, but complicated.
git tag --contains 896adfd2d8b3369110e9618bd190d190105372a9
suggests that support for the ovn-ofctrl-wait-before-clear
is from v22.06.0
, and you run 22.03.
here's the commit for your reference: https://github.com/ovn-org/ovn/commit/896adfd2d8b3369110e9618bd190d190105372a9
git tag --contains 896adfd2d8b3369110e9618bd190d190105372a9
suggests that support for theovn-ofctrl-wait-before-clear
is fromv22.06.0
, and you run 22.03.
Actually, the support for that knob has been backported to 22.03 too (for scalability reasons): https://github.com/ovn-org/ovn/commit/4a34b878d02464266c2b7ff2779de121b130e065
It's in there since v22.03.2.
@legitYosal your ovn-controller log says you're running 22.03.0-20.21.0-58.3
, could you please upgrade to the latest v22.03.7 and retest? The knob doesn't do anything in the version you're currently running.
From reading ovn-ofctrl-wait-before-clear multiple times I think it is saying, setting this option will prevent purging flows before recompute, but because the number of flows are too much, even purging and reinstalling them takes too long:
Ok this is stage but on production we have hosts almost 80K flows, totally around 200K flows on southbound
So the problem here will not be solvable? could we tell ovn-controller to not purge the flows?
@legitYosal ovn-ofctrl-wait-before-clear should help in your case.
For "purging and reinstalling them takes too long", it is also solved by replacing the flows in OVS bundle - as a single transaction. It is the patch d53c599ed0, which is after the ovn-ofctrl-wait-before-clear patch 896adfd2d8b. However, the patch d53c599ed0 is not in branch-22.03, but only after 22.06. You may try 22.06, or backport to 22.03 by yourself (for backporting you will need e50111213, too)
From reading ovn-ofctrl-wait-before-clear multiple times I think it is saying, setting this option will prevent purging flows before recompute, but because the number of flows are too much, even purging and reinstalling them takes too long:
Ok this is stage but on production we have hosts almost 80K flows, totally around 200K flows on southbound
So the problem here will not be solvable? could we tell ovn-controller to not purge the flows?
@legitYosal ovn-ofctrl-wait-before-clear should help in your case.
For "purging and reinstalling them takes too long", it is also solved by replacing the flows in OVS bundle - as a single transaction. It is the patch d53c599, which is after the ovn-ofctrl-wait-before-clear patch 896adfd. However, the patch d53c599 is not in branch-22.03, but only after 22.06. You may try 22.06, or backport to 22.03 by yourself (for backporting you will need e501112, too)
Actually, both of these are available in branch-22.03: https://github.com/ovn-org/ovn/commit/ebfbedd0ceda723d5f78773c965529ee136a5720 https://github.com/ovn-org/ovn/commit/9a0e90be73af6f9d16765286d1c1734e91bc7d8d
Using the latest v22.03.7 tag should be fine.
Actually, both of these are available in branch-22.03: https://github.com/ovn-org/ovn/commit/ebfbedd0ceda723d5f78773c965529ee136a5720 https://github.com/ovn-org/ovn/commit/9a0e90be73af6f9d16765286d1c1734e91bc7d8d
Using the latest v22.03.7 tag should be fine.
Thanks @dceara for correcting me. I made a mistake when checking the branches.
Thank you for sharing your knowledge, I have tested with the ovn-24.03.1 build with ovs-3.3.0, it was working flawlessly as described, on production I will go to v22.03.7 as @dceara mentioned.
We are using OVN with neutron on our openstack cluster, After restarting ovn_controller container we are encountering network shortage on private and public networks of VM's, Also setting
ovs-vsctl set open . other_config:flow-restore-wait=true
will not affect this although vswitchd restarts will not affect network connectivity. Can someone give a technical explanation on why this happens and possible solutions to upgrade and restart ovn controller container without down time?