sustainable-computing-io / kepler

Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe performance counters and other system stats, use ML models to estimate workload energy consumption based on these stats, and exports them as Prometheus metrics
https://sustainable-computing.io
Apache License 2.0
1.1k stars 174 forks source link

High Kepler CPU usage under normal workloads #1670

Open vimalk78 opened 1 month ago

vimalk78 commented 1 month ago

Without any load on system, kepler CPU usage goes upto 20%

vimalk78 commented 1 month ago

https://github.com/sustainable-computing-io/kepler/issues/1660#issuecomment-2265665980

vimalk78 commented 1 month ago

on latest main, if machine is loaded with stress-ng, the kepler cpu usage spikes. In comparison, the kepler before ringbuffer does not show increase in cpu if machine is loaded.

asciicast

vimalk78 commented 1 month ago

comparing with old code, some kepler cpu usage spike is understandable since some processing ( 3 map lookup, 2 update, 1 delete) was happening in kernel context and cpu cycles for these were accounted for in the kernel, which now happens in user space and gets counted as kepler cpu.

need to check if we can reduce the cpu spike in kepler when machine is loaded.

dave-tucker commented 1 month ago

need to check if we can reduce the cpu spike in kepler when machine is loaded.

exactly! I'm now able to reproduce with stress-ng and I'm working to keep that CPU spike as low as possible.

rootfs commented 1 month ago

@dave-tucker can you create a feature branch, move the code there, and revert the related commits?

vimalk78 commented 1 month ago

i ran some perf stat tests to check impact of kepler on context switch time. idea being that since kepler traps sched_switch and does some processing, it should have some impact on context switch time. stress-ng is used in parallel to simulate load.

root@bkr18:~# sudo perf stat -a -e sched:sched_switch --timeout 600000 # with kepler latest with load

 Performance counter stats for 'system wide':

        79,620,228      sched:sched_switch                                                    

     600.099929726 seconds time elapsed

Observation: with kepler running, the number of context switches goes down, as expected. But with ring-buffer changes, the drop is more than 7-11 release.

Test is run on a bare-metal machine with almost no other load.

stress-ng command: stress-ng --cpu 8 --iomix 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 11m