kytos-ng / kytos

Kytos SDN Platform. Kytos is designed to be easy to install, use, develop and share Network Apps (NApps).
https://kytos-ng.github.io/
MIT License
3 stars 8 forks source link

perf: investigate moderate 50% CPU usage in stationary connected topology #505

Open viniarck opened 5 days ago

viniarck commented 5 days ago

@italovalcy on his 2023.2 exploratory tests has identified a moderate-ish 50% CPU spikes with a stationary connected topology (no network events convergence happening). So, I'm capturing this to be investigated in the feature, in the meantime since I was at it, I also managed to reproduce it.

I'm using a 3-switch ring topology with OvS with master branch (which will be future 2024.2 as of Oct 18, 2024). Collected proc cpu and mem usage with these cases (I'm running Linux as my OS, and CPU 12th Gen Intel(R) Core(TM) i7-12700H):

1) Case 1 - Switches connected stationary with psrecord sampling every 1s

kytosd_stationary_2

On a 1-second interval CPU usage is fairly low as expected, no issues here.

2) Case 2 - Switches connected stationary with psrecord sampling every 0.1s

kytosd_stationary_11

This case reflects something similar to what @italovalcy has seen and presented, so indeed it might be something in this 100 ms scale of the basic and periodic functionalities and tasks of the platform and its NApps that is causing this. So, it needs further CPU profiling instrumenting kytosd to see what's consuming the most and causing the spikes, it'd be cool to see this at the method level too.

3) Case 3 - Switches NOT connected with psrecord sampling every 0.1s

kytosd_stationary_10

In this case, the only difference was switches not connected, but psrecord sampling eveyr 0.1s, and no spikes were observed, so this also confirms that indeed it's something related to periodic functionalities involving the switches connected and adjacent core parts

Related issues

Related issue https://github.com/kytos-ng/kytos/issues/478 (but with network scalability convergence)

viniarck commented 5 days ago

4) Case 4 - 20 Switches connected stationary with psrecord sampling every 0.1s

If you also increase the number of switches, I'll consume more CPU and generate more spikes, it's correlated (don't know yet but how much), but this is also another evidence to investigate and keep in mind the future

kytosd_stationary_13

viniarck commented 5 days ago

5) Case 5 - 20 Switches connected stationary with psrecord sampling every 1s

kytosd_stationary_14

With 20 switches but psrecord sampling 1s no issues, despite one single spike (needs to be measured more times)