Open bpetit opened 3 years ago
Is there any reason to want an msr-based sensor instead on a sensor based on perf_open
(https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html) ?
It seems to me that perf_open
have lower right requirements : a /proc/sys/kernel/perf_event_paranoid
value of less than 1 or the CAP_PERFMON
(since Linux 5.8) or CAP_SYS_ADMIN
capability.
MSR reading on the other hand, requires root, which will generally quite hard to obtain on production systems.
Hi,
what do you think about this (maybe old) code?
https://github.com/kentcz/rapl-tools
I am not working with AMD but I am facing a similar problem on a 11th Gen Intel(R) Core(TM) i9-11900 where the id of the model is:
printf "0x%X\n" $(cat /proc/cpuinfo | grep model | grep -v name | uniq | cut -d: -f2)
0xA7
and the os is
Linux arriel 5.8.0-55-generic #62~20.04.1-Ubuntu
In this setup, rapl linux modules does not seem to be supported, however, the code I mentionned is giving me some power measures.
I am currently checking whether these are correct by comparing their values on another machine where rapl linux modules are usable.
I am also having a look at variorum, but it does not seem to solve the support issue
Were you looking for something like this? Or do you think that a portable solution dealing with the different cpu architectures would be much more complicated to implement?
Thanks for your input if you have any clue.
@PierreRust you are right about the requirements of accessing the MSRs. I guess we should provide both approaches, if possible. I'm currently working on #74 and using the MSRs (indirectly, through a custom driver) is a solution that seems to work properly in that context. On linux however, we should probably give a shot on perf_open !
@paulgay thanks for sharing those tools, I didn't know varorium and it seems worth following the project to see if it can help at some point.
Looking at your CPU, I think this is related to #131 . You should follow this thread instead of the MSR one, which is more an exploratory topic. In your case it seems to be because of powercap changes on the most recent intel CPUs, as discussed in the thread.
Regarding MSR-based sensors, I'm working on one for windows, as mentionned in the previous lines. This will probably not be linux compatible soon, but we could start from that draft to bootstrap something for Linux. Or we could start something using perf_open as @PierreRust mentionned. The later seems more accurate for Linux, but I'm not ready to work on that personally so it's a bit early to say that we will take one direction or another.
Problem
We noticed that for kernel older than 5.11 on AMD Zen CPUs, perf is working fine and scaphandre is not. The reason is that kernel drivers for amd cpus needed by powercap to get rapl data are not present before 5.11. Perf relies on MSR.
Solution
We need an MSR based sensor to be able to get metrics when powercap rapl is not a solution.
Alternatives
None that I think of, but discussions are welcome.
Additional context
None.