Open vimalk78 opened 3 months ago
on latest main, if machine is loaded with stress-ng, the kepler cpu usage spikes. In comparison, the kepler before ringbuffer does not show increase in cpu if machine is loaded.
comparing with old code, some kepler cpu usage spike is understandable since some processing ( 3 map lookup, 2 update, 1 delete) was happening in kernel context and cpu cycles for these were accounted for in the kernel, which now happens in user space and gets counted as kepler cpu.
need to check if we can reduce the cpu spike in kepler when machine is loaded.
need to check if we can reduce the cpu spike in kepler when machine is loaded.
exactly! I'm now able to reproduce with stress-ng and I'm working to keep that CPU spike as low as possible.
@dave-tucker can you create a feature branch, move the code there, and revert the related commits?
i ran some perf stat
tests to check impact of kepler on context switch time. idea being that since kepler traps sched_switch and does some processing, it should have some impact on context switch time. stress-ng is used in parallel to simulate load.
without running kepler
root@bkr18:~# sudo perf stat -a -e sched:sched_switch --timeout 600000 # with no kepler with load
Performance counter stats for 'system wide':
90,480,301 sched:sched_switch
600.105927296 seconds time elapsed
with running kepler release-0.7.11
root@bkr18:~# sudo perf stat -a -e sched:sched_switch --timeout 600000 # with kepler 0.7.11 with load
Performance counter stats for 'system wide':
87,500,721 sched:sched_switch
600.100293869 seconds time elapsed
with running kepler latest (with ring buffer )
root@bkr18:~# sudo perf stat -a -e sched:sched_switch --timeout 600000 # with kepler latest with load
Performance counter stats for 'system wide':
79,620,228 sched:sched_switch
600.099929726 seconds time elapsed
Observation: with kepler running, the number of context switches goes down, as expected. But with ring-buffer changes, the drop is more than 7-11 release.
Test is run on a bare-metal machine with almost no other load.
stress-ng command:
stress-ng --cpu 8 --iomix 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 11m
Without any load on system, kepler CPU usage goes upto 20%