Closed zejar closed 6 months ago
I am also seeing this issue can anyone point some solution
It looks like there are lots of changes in the routing table, which isn't too surprising given that the system is running BGP. OVS tries to keep up with these. I'm surprised it's this expensive.
If you stop the BGP daemon temporarily, does OVS CPU usage drop to near-zero? If so, then that would confirm the root of the issue and we can look into how to avoid the high CPU for this case.
It looks like there are lots of changes in the routing table, which isn't too surprising given that the system is running BGP. OVS tries to keep up with these. I'm surprised it's this expensive.
If you stop the BGP daemon temporarily, does OVS CPU usage drop to near-zero? If so, then that would confirm the root of the issue and we can look into how to avoid the high CPU for this case.
I have the same issue, bird running BGP, and server having a full table in kernel is indeed the culprit. How can we make openvswitch behave normally in such scenario?
@zejar have you found a solution to this problem?
@ddominet I have not. Instead I switched to "regular" Linux bridges.
If it is not necessary for OVS to be aware of these routes, the solution is to move BGP daemon into a separate network namespace.
@igsilya
I'm using it with kolla ansible openstack, BGP is there to provide a connectivity to a prefix, that will need to be avaliable within openstack. Is it something that will work with separate namespaces? Might be too separated. If it was a separate routing table than i could just policy route it. In this case i don't think so
Unfortunately, I'm not a big BGP or OpenStack expert to tell if it will work.
FWIW, Here is a link to discussion on why OVS is consuming a lot of CPU in this scenario: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052092.html
After reading the discussion that @igsilya shared link to in his last comment, I was able to recreate the issue on pure Ubuntu 20.04.6 LTS, kernel 5.4.0-162-generic, openvswitch-switch 2.13.8-0ubuntu1.2, 1 vcpu, without running BGB daemon at all. This is only related to the massive routing table update stream. One of the triggers may be BGB full table sync, the other example is as follows (without BGB/Bird running):
^this loop will continue for some time, don't need to wait for it to complete to see the symptoms in the logs or in htop/top.
sudo grep blocked /var/log/openvswitch/ovs-vswitchd.log | sort -n -k2 | tail 2023-10-18T15:06:45.577Z|00019|ovs_rcu(urcu2)|WARN|blocked 1006 ms waiting for main to quiesce 2023-10-18T15:06:54.166Z|00027|ovs_rcu(urcu2)|WARN|blocked 1006 ms waiting for main to quiesce 2023-10-18T15:07:00.512Z|00033|ovs_rcu(urcu2)|WARN|blocked 1006 ms waiting for main to quiesce 2023-10-18T15:07:04.773Z|00037|ovs_rcu(urcu2)|WARN|blocked 1006 ms waiting for main to quiesce 2023-10-18T15:07:14.049Z|00045|ovs_rcu(urcu2)|WARN|blocked 1006 ms waiting for main to quiesce 2023-10-18T15:07:45.074Z|00067|ovs_rcu(urcu2)|WARN|blocked 1006 ms waiting for main to quiesce 2023-10-18T15:08:00.661Z|00078|ovs_rcu(urcu2)|WARN|blocked 1006 ms waiting for main to quiesce 2023-10-18T15:07:05.854Z|00038|ovs_rcu(urcu2)|WARN|blocked 1007 ms waiting for main to quiesce 2023-10-18T15:07:56.270Z|00075|ovs_rcu(urcu2)|WARN|blocked 1007 ms waiting for main to quiesce 2023-10-18T15:08:12.214Z|00086|ovs_rcu(urcu2)|WARN|blocked 2004 ms waiting for main to quiesce
In this example all routing tables entries sum up to 14k+: sudo ip r l table 0 | wc -l 14650
bump?
Any news?
Unfortunately OVS team doesn’t seem to take those issues seriously. And that to me is a big one, or to anyone working in ISP environment.
Kind Regards, Dominik
W dniu czw., 14.03.2024 o 08:53 Daniel Preussker @.***> napisał(a):
bump?
Any news?
— Reply to this email directly, view it on GitHub https://github.com/openvswitch/ovs-issues/issues/185#issuecomment-1996764869, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASHDJ2NKJI63DS4PFPNFKZTYYFJO3AVCNFSM4NFXOQR2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJZGY3TMNBYGY4Q . You are receiving this because you were mentioned.Message ID: @.***>
As I said, the issue doesn't have a good solution, only compromises. The reason is that Linux kernel routing change notifications are not a reliable source of information. Any application that tries to incrementally maintain the state of the routing table based on these notifications is implementing a huge amount of heuristics to guess which notification is a good one and which is not. And they are never fully correct, meaning old already removed entries may linger in a system or some updates may not be delivered. See: https://bugzilla.redhat.com/show_bug.cgi?id=1337855 https://bugzilla.redhat.com/show_bug.cgi?id=1722728 https://github.com/thom311/libnl/issues/226 https://github.com/thom311/libnl/issues/224
The only approach that gives a correct view on a current state of a routing table is to dump it as a whole after each change and that is causing high CPU usage for obvious reasons if you have huge routing tables with a high rate of updates.
Most of the OVS users are not running BGP in the same network namespace with ovs-vswitchd, so it is not a problem for them. If you can change the architecture of your deployment to not run BGP in the same network namespace with OVS, that is still a preferred approach to fix the problem.
One approach we can take is to delay the dumping of the routing table by a certain amount of time. This should alleviate some of the load at a cost of not having the most up to date routing table for that amount of time (few seconds?). Is that a reasonable compromise for your setup?
BGP is there to provide a connectivity to a prefix, that will need to be avaliable within openstack
What exactly these routes are used for? Do you have an OVS tunnel that relies on these routes?
Alternative and not recommended way to resolve this issue is to just disable synchronization with the kernel routing table with a command-line argument we have for testing purposes: --disable-system-route
. Some OVS functionality will stop working, most notably OVS will not be able to determine link state changes for OVS tunnel ports, because it will not be able to lookup the route and determine on which interface the traffic will go after encapsulation. The tunnels themselves will be working just fine, since they operate inside the kernel. Some other functionality like checking route table with ovs-appctl
commands will not work as well, for obvious reasons. And this will completely break tunnels in userspace datapath, but I'm assuming you're using the Linux kernel datapath. I'm maybe missing a few things that will stop working, but I listed the most notable I remember.
This option is not officially supported in any form, but you can try it, if it is the only option for your setup.
Have a solution which worked for my environment. While running the ovsdb process try adding this Daemon Option- --detach Runs ovsdb-server as a background process. The process forks, and in the child it starts a new session, closes the standard file descriptors (which has the side effect of disabling logging to the console), and changes its current directory to the root (unless --no-chdir is specified). After the child completes its initialization, the parent exits. ovsdb-server detaches only after it starts listening on all configured remotes. At this point, all standalone and active-backup databases are ready for use. Clustered databases only become ready for use after they finish joining their clusters (which could have already happened in previous runs of ovsdb-server). More information on this link:https://www.openvswitch.org//support/dist-docs/ovsdb-server.1.html @igsilya @ddominet @wilkmar @zejar
@AswiniViswanathan I don't think the --detach
option on ovsdb process is related to this issue.
@igsilya Yes its related to utilization, I faced the below issue and after adding the detach option it did fix the problem. 2020-05-20T08:53:06.637Z|00834|timeval|WARN|Unreasonably long 1455ms poll interval (735ms user, 697ms system)
Give it a try.
@AswiniViswanathan it's great that it worked for you somehow, but that makes no sense to me. This option has nothing to do with routing or CPU usage and you're suggesting to add it to the process that is not the problem with this issue. And I'm pretty sure that other people on this thread already use this option as it is a default way to run OVS services.
@igsilya okay. Anyway lets see if it helps anyone else.
@igsilya I just finished moving the BGP routing table into it's own VRF but the issue persists.
the default routing table is now effectively empty and the interfaces inside the OVS bridges are all part of their own dedicated VRF that only contains the on-link subnets and a default route.
BGP with full tables is in it's own routing table with ~2.4m entries.
It seems that OVS doesnt care at all which routing table is used or am I missing something obvious?
@f0o yes, OVS monitors all tables. You need a separate network namespace. There was a suggestion in the past to only dump the default one, we may explore this option.
@igsilya Gotcha, I foolishly assumed vrf would follow the same logic as netns.
Let's see if I can replicate this whole thing with netns instead!
@igsilya Couldn't get it all to work with netns so I took the "easy" way and tried with --disable-system-route
- however that parameter does not change the situation.
I verified ovs-vswitchd was actually started with it:
ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --disable-system-route --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
@f0o , yeah, my bad. Looking through the code, OVS will not update internal routing table if this option enabled, but it will still dump the whole routing table from the kernel. So, yeah, it removes some of the work, but clearly not enough. This is clearly not an intended use case for this option.
I'm working on a fix to only track changes from the main routing table, will post for review hopefully this week. It should help if you configure BGP with its own VRF. But for now, it seems, only a network namespace will work.
@igsilya fair enough, I'm really excited for your VRF fix 😅 - Maybe to make it more modular, have a parameter that you can use to select a table instead of just defaulting to the default/local one. Could be useful sometimes.
I'll continue to work towards a netns solution just to get a solution to into this issue that might work for others without waiting for now packages to be released.
@igsilya Although I can confirm that using netns solves the issue, there is no good way to "leak" routes across like there is with VRFs - meaning that moving traffic from OVS into the BGP netns is tedious at best and require linknets adding a lot more overhead than it should but a simple left-pocket -> right-pocket routing action.
So I cant accept netns as a viable solution unfortunately
@f0o there will be a cost of running in the same network namespace. Linux kernel doesn't provide a mechanism to receive routes for specific tables only. That means that for every BGP update ovs-vswitchd will still receive a notification. It will be parsed and discarded. This should not be very expensive to do, but it may take a bit of time.
Furthermore, whenever there is a normal route update in the main/local/default table, OVS will have to dump all the routes from all the tables (no filtering supported by the kernel), parse them and choose the relevant ones. That will still be expensive to do. The impact will depend on how frequently the VRF will update the main routing table by leaking the routes.
Edit: There might be a way to filter out the full dump.
@igsilya I'm running into an odd issue and I wonder if it's connected to this one.
I cant seem to manage any bridges with ovs-vsctl anymore.
I keep getting the error:
...00002|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
worth to note, all ovs-vsctl commands are through ansible and with a default timeout of 5s.
//EDIT:
Increasing the timeout worked but I also notice a very very big amount of these errors in OVN:
2024-03-18T18:08:50.183Z|00159|rconn(ovn_statctrl2)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Protocol error)
And I believe that's also caused by the full-tables here...
The moment I dropped this router out of the IBGP mesh the errors stopped and ports were being added etc...
So this 100% CPU utilization is not just an annoyance because it's pegging a core but it actually immobilizes OVS entirely!
@f0o I assume ovs-vswitchd reports long poll intervals, what are the largest numbers you see there? If they are in 2.5+ seconds (2.5 before connection is accpeted + potentially 2.5 on the next iteration before the reply is sent), then yes, processes that do not wait longer than 5 seconds may disconnect. Some database operations may potentially wait for more than 2 poll intervals.
For OVN, I assume you're using OVN 23.09 older than v23.09.1. The statctrl thread had an inactivity probe set to 5 seconds before commit https://github.com/ovn-org/ovn/commit/bbd07439b . Update to latest v23.09.3 should fix the problem.
@igsilya Thanks for the insight!
I went ahead and built openvswitch-3.2.1 with the patch from 01f7582584354cf087924170723dd0838d8b34f3 and I can confidently say that it is no longer burning CPU nor being unresponsive. All traffic is flowing fine and ports/flows are being added/removed swiftly!
I guess I'm your lab-bunny now! 🙈
@f0o Thanks a lot for testing! Now we just need to wait for some code review. Hopefully, it won't take too long.
If you want to reply to the original patch email with a Tested-by: Name <email>
tag, we can include it in the commit message before applying. Assuming you're not subscribed to ovs-dev list, you may reply by importing an mbox from patchwork to your email client or pressing on the mailto link at the top of the page in archive. leaving the tag here in a github comment is also fine.
But, of course, all that is completely optional. :)
I just ran a few tests by adding/removing flows, ports and bridges like a maniac and verifying that traffic flows while both routers remain in IBGP and EBGP with multiple full tables in different VRFs.
Everything works like a breeze! ovs-vswitchd chills at 2% cpu and from the logs of OVS/OVN everything is operational and normal.
I tried adding the mbox or using the mailto but all my mail clients strip away the Reply-To header "because of safety". So I'm unable to stamp it there.
So please take my informal stamp of testing and approval here :)
Great Job Team!
Sorry for my harsh words previously, But happy that it’s fixed.
Kind Regards, Dominik
W dniu wt., 19.03.2024 o 09:56 Daniel Preussker @.***> napisał(a):
I just ran a few tests by adding/removing flows, ports and bridges like a maniac and verifying that traffic flows while both routers remain in IBGP and EBGP with multiple full tables in different VRFs.
Everything works like a breeze! ovs-vswitchd chills at 2% cpu and from the logs of OVS/OVN everything is operational and normal.
I tried adding the mbox or using the mailto but all my mail clients strip away the Reply-To header "because of safety". So I'm unable to stamp it there.
So please take my informal stamp of testing and approval here :)
— Reply to this email directly, view it on GitHub https://github.com/openvswitch/ovs-issues/issues/185#issuecomment-2006377007, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASHDJ2PHFZHLTE6WGKI7SETYY74TZAVCNFSM4NFXOQR2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBQGYZTONZQGA3Q . You are receiving this because you were mentioned.Message ID: @.***>
@f0o , @ddominet , the patch is applied now to all branches down to 2.17. Will be part of the next set of stable releases. Not sure when distributions will pick those up.
With the change you should be able to run BGP in a separate VRF without significant impact on OVS CPU usage. But running with the main routing table will still be problematic.
Closing this issue for now.
Amazing job @igsilya! Thank you so much!
As the title says I'm running OpenVSwitch on a Debian box that is configured to be my router. The box receives a full routing table (IPv6) via Bird2 and as soon as the full table starts coming in OVS will start utilizing 100% of my cpu. I let it run overnight but 12 hours later it was still at 100%. The program utilizing the cpu is "ovs-vswitchd".
This issue can be reproduced by setting up a BGP session with a full routing table and purely installing the package
openvswitch-switch
. No config needs to be done for OVS, it will consume the entire cpu when the full routing table is present on the machine and OVS is running.Debian version:
Linux rtr1 5.6.0-1-cloud-amd64 #1 SMP Debian 5.6.7-1 (2020-04-29) x86_64 GNU/Linux
Bird version:BIRD 2.0.7
OVS version:ovs-vswitchd (Open vSwitch) 2.13.0
The
/var/log/openvswitch/ovs-vswitchd.log
file shows this: