Open dave-tucker opened 1 month ago
The Kepler CPU usage under normal and stress workloads need to be investigated in parallel. The latest stress test results point to a divergence that needs to be fixed.
Test results posted on the original PR https://github.com/sustainable-computing-io/kepler/pull/1628
@dave-tucker load the kepler latest image and keep it running for a day.
Test results posted on the original PR #1628
Responded: https://github.com/sustainable-computing-io/kepler/pull/1628#issuecomment-2269058775
What would you like to be added?
This constant: https://github.com/sustainable-computing-io/kepler/blob/main/bpf/kepler.bpf.c#L70
Declares how often we wake up to read the ringbuf.
The current math was as follows:
So 1000 should have me read every 1.7ish seconds 😄
Why is this needed?
When kepler wakes up to read events it consumes CPU. Right now that's showing us as being somewhere between 1-3% mean CPU usage over time. We should consider whether there is a better formula we could use to compute this magic number of 1000.
It could relate to the sample rate.
e.g
500 * SampleRate
and perhaps even the500
could come from something better than an educated guess.