Closed legitYosal closed 1 month ago
br-int<->unix
is an OpenFlow connection, not the database connection. Looks like your ovn-controller
doesn't respond for a long time to OpenFlow probes. You need to check what it is doing so long.
Yes it is in ovn controller:
2024-10-15T09:18:45.469Z|05297|inc_proc_eng|INFO|node: physical_flow_output, handler for input if_status_mgr took 94629ms
2024-10-15T09:18:45.711Z|05298|memory_trim|INFO|Detected inactivity (last active 96260 ms ago): trimming memory
2024-10-15T09:18:45.713Z|05299|timeval|WARN|Unreasonably long 94879ms poll interval (93459ms user, 1292ms system)
Thanks Ilya, I will close this issue
Hi,
Recently upgraded from 2.17 to 3.3.0, It seems vswitchd connection to local ovsdb drops, and it is blocked from adding flows, for example when you live migrate a VM onto a chassis, vswitchd logs(VM has two ports):
After 2 - 3 minutes not changing anything it overloads and does a 57K change!
Normal logs on vswitchd are like this(another chassis which does not show the symptoms):
It has still a wide gap in seconds but I guess it is ok and I have a live migrate ping packet loss span from 1s to 4s which is acceptable.
Also on ovsdb on info vlog it does not show anything at all, enabling dbg on ovsdb shows some logs:
logs are very long and I am attaching a more detailed file, no visible bug or error.
I was thinking maybe a bug fix was added to 3.3.2 but reading commit messages I did not see anything related: https://github.com/openvswitch/ovs/compare/v3.3.0...v3.3.2
Also output of Open_vswitch table:
This is not repeating everywhere but on one production I have 50% of my live migrations hitting this issue, could not recreate it on stage!
Do you have any guess what is wrong and what is the cause maybe?