Open PierreRustOrange opened 2 years ago
It seems this can also happen if the appropriate kernel module is not loaded.
For ubuntu, the module is in the linux-modules-extra
package
apt install linux-modules-extra-$(uname -r)
update-initramfs -c -k $(uname -r)
The module is in /usr/lib/modules/$(uname -r)/kernel/arch/x86/events/
in rapl.ko
for recent kernels or intel/intel-rapl-perf.ko
for older kernels.
Thanks @gfieni for this info !
However, I think there are still cases where perf implementation is not available in a kernel (for a recent cpu), while powercap is ok. For example with a 5.4 kernel on a i7-10875H (that's a laptop spu, but I've seen similar issue with server class cpu).
Hi, Is there any update on the issue ? @PierreRustOrange @rouvoy
It seems I can't use the hwpc-sensor. My issue seems to be similar to https://github.com/powerapi-ng/powerapi/issues/125 Even after trying to install the appropriate kernel module by running the command advised in the previous comment, I still have issues with perf
sudo perf stat -a -e "power/energy-cores/" /bin/ls
[sudo] Mot de passe de dimitri :
event syntax error: 'power/energy-cores/'
\___ Cannot find PMU `power'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
I have a laptop with a 5.13.0-28-generic
kernel version and a 11th Gen Intel(R) Core(TM) i7-1165G7
CPU (if any more infos could help don't hesitate to ask me)
Would the advice solution be to implement a sensor accessing RAPL through powercap sysfs ?
Hello @dsaingre,
Which Linux distribution are you using ?
Are the energy readings of powercap available on your system ?
Could you give the result of the modinfo rapl
command ?
Hi @gfieni,
I'm using Ubuntu 20.04.3 LTS
.
I believe I do have the energy readings available :
tree /sys/devices/virtual/powercap
/sys/devices/virtual/powercap
├── dtpm
│ ├── enabled
│ ├── power
│ │ ├── async
│ │ ├── autosuspend_delay_ms
│ │ ├── control
│ │ ├── runtime_active_kids
│ │ ├── runtime_active_time
│ │ ├── runtime_enabled
│ │ ├── runtime_status
│ │ ├── runtime_suspended_time
│ │ └── runtime_usage
│ ├── subsystem -> ../../../../class/powercap
│ └── uevent
├── intel-rapl
│ ├── enabled
│ ├── intel-rapl:0
│ │ ├── constraint_0_max_power_uw
│ │ ├── constraint_0_name
│ │ ├── constraint_0_power_limit_uw
│ │ ├── constraint_0_time_window_us
│ │ ├── constraint_1_max_power_uw
│ │ ├── constraint_1_name
│ │ ├── constraint_1_power_limit_uw
│ │ ├── constraint_1_time_window_us
│ │ ├── constraint_2_max_power_uw
│ │ ├── constraint_2_name
│ │ ├── constraint_2_power_limit_uw
│ │ ├── constraint_2_time_window_us
│ │ ├── device -> ../../intel-rapl
│ │ ├── enabled
│ │ ├── energy_uj
│ │ ├── intel-rapl:0:0
│ │ │ ├── constraint_0_max_power_uw
│ │ │ ├── constraint_0_name
│ │ │ ├── constraint_0_power_limit_uw
│ │ │ ├── constraint_0_time_window_us
│ │ │ ├── device -> ../../intel-rapl:0
│ │ │ ├── enabled
│ │ │ ├── energy_uj
│ │ │ ├── max_energy_range_uj
│ │ │ ├── name
│ │ │ ├── power
│ │ │ │ ├── async
│ │ │ │ ├── autosuspend_delay_ms
│ │ │ │ ├── control
│ │ │ │ ├── runtime_active_kids
│ │ │ │ ├── runtime_active_time
│ │ │ │ ├── runtime_enabled
│ │ │ │ ├── runtime_status
│ │ │ │ ├── runtime_suspended_time
│ │ │ │ └── runtime_usage
│ │ │ ├── subsystem -> ../../../../../../class/powercap
│ │ │ └── uevent
│ │ ├── intel-rapl:0:1
│ │ │ ├── constraint_0_max_power_uw
│ │ │ ├── constraint_0_name
│ │ │ ├── constraint_0_power_limit_uw
│ │ │ ├── constraint_0_time_window_us
│ │ │ ├── device -> ../../intel-rapl:0
│ │ │ ├── enabled
│ │ │ ├── energy_uj
│ │ │ ├── max_energy_range_uj
│ │ │ ├── name
│ │ │ ├── power
│ │ │ │ ├── async
│ │ │ │ ├── autosuspend_delay_ms
│ │ │ │ ├── control
│ │ │ │ ├── runtime_active_kids
│ │ │ │ ├── runtime_active_time
│ │ │ │ ├── runtime_enabled
│ │ │ │ ├── runtime_status
│ │ │ │ ├── runtime_suspended_time
│ │ │ │ └── runtime_usage
│ │ │ ├── subsystem -> ../../../../../../class/powercap
│ │ │ └── uevent
│ │ ├── max_energy_range_uj
│ │ ├── name
│ │ ├── power
│ │ │ ├── async
│ │ │ ├── autosuspend_delay_ms
│ │ │ ├── control
│ │ │ ├── runtime_active_kids
│ │ │ ├── runtime_active_time
│ │ │ ├── runtime_enabled
│ │ │ ├── runtime_status
│ │ │ ├── runtime_suspended_time
│ │ │ └── runtime_usage
│ │ ├── subsystem -> ../../../../../class/powercap
│ │ └── uevent
│ ├── intel-rapl:1
│ │ ├── constraint_0_max_power_uw
│ │ ├── constraint_0_name
│ │ ├── constraint_0_power_limit_uw
│ │ ├── constraint_0_time_window_us
│ │ ├── constraint_1_max_power_uw
│ │ ├── constraint_1_name
│ │ ├── constraint_1_power_limit_uw
│ │ ├── constraint_1_time_window_us
│ │ ├── device -> ../../intel-rapl
│ │ ├── enabled
│ │ ├── energy_uj
│ │ ├── max_energy_range_uj
│ │ ├── name
│ │ ├── power
│ │ │ ├── async
│ │ │ ├── autosuspend_delay_ms
│ │ │ ├── control
│ │ │ ├── runtime_active_kids
│ │ │ ├── runtime_active_time
│ │ │ ├── runtime_enabled
│ │ │ ├── runtime_status
│ │ │ ├── runtime_suspended_time
│ │ │ └── runtime_usage
│ │ ├── subsystem -> ../../../../../class/powercap
│ │ └── uevent
│ ├── power
│ │ ├── async
│ │ ├── autosuspend_delay_ms
│ │ ├── control
│ │ ├── runtime_active_kids
│ │ ├── runtime_active_time
│ │ ├── runtime_enabled
│ │ ├── runtime_status
│ │ ├── runtime_suspended_time
│ │ └── runtime_usage
│ ├── subsystem -> ../../../../class/powercap
│ └── uevent
└── intel-rapl-mmio
├── enabled
├── intel-rapl-mmio:0
│ ├── constraint_0_max_power_uw
│ ├── constraint_0_name
│ ├── constraint_0_power_limit_uw
│ ├── constraint_0_time_window_us
│ ├── constraint_1_max_power_uw
│ ├── constraint_1_name
│ ├── constraint_1_power_limit_uw
│ ├── constraint_1_time_window_us
│ ├── device -> ../../intel-rapl-mmio
│ ├── enabled
│ ├── energy_uj
│ ├── max_energy_range_uj
│ ├── name
│ ├── power
│ │ ├── async
│ │ ├── autosuspend_delay_ms
│ │ ├── control
│ │ ├── runtime_active_kids
│ │ ├── runtime_active_time
│ │ ├── runtime_enabled
│ │ ├── runtime_status
│ │ ├── runtime_suspended_time
│ │ └── runtime_usage
│ ├── subsystem -> ../../../../../class/powercap
│ └── uevent
├── power
│ ├── async
│ ├── autosuspend_delay_ms
│ ├── control
│ ├── runtime_active_kids
│ ├── runtime_active_time
│ ├── runtime_enabled
│ ├── runtime_status
│ ├── runtime_suspended_time
│ └── runtime_usage
├── subsystem -> ../../../../class/powercap
└── uevent
(is this relevant and what you're asking? Not very knowledgeable yet on powercap and co)
Regarding modinfo rapl
:
filename: /lib/modules/5.13.0-28-generic/kernel/arch/x86/events/rapl.ko
license: GPL
srcversion: E0C3F70A00E2957694E4176
alias: cpu:type:x86,ven0002fam0019mod*:feature:*
alias: cpu:type:x86,ven0009fam0018mod*:feature:*
alias: cpu:type:x86,ven0002fam0017mod*:feature:*
alias: cpu:type:x86,ven0000fam0006mod008F:feature:*
alias: cpu:type:x86,ven0000fam0006mod009A:feature:*
alias: cpu:type:x86,ven0000fam0006mod0097:feature:*
alias: cpu:type:x86,ven0000fam0006mod00A5:feature:*
alias: cpu:type:x86,ven0000fam0006mod00A6:feature:*
alias: cpu:type:x86,ven0000fam0006mod006A:feature:*
alias: cpu:type:x86,ven0000fam0006mod006C:feature:*
alias: cpu:type:x86,ven0000fam0006mod007D:feature:*
alias: cpu:type:x86,ven0000fam0006mod007E:feature:*
alias: cpu:type:x86,ven0000fam0006mod007A:feature:*
alias: cpu:type:x86,ven0000fam0006mod005F:feature:*
alias: cpu:type:x86,ven0000fam0006mod005C:feature:*
alias: cpu:type:x86,ven0000fam0006mod0066:feature:*
alias: cpu:type:x86,ven0000fam0006mod009E:feature:*
alias: cpu:type:x86,ven0000fam0006mod008E:feature:*
alias: cpu:type:x86,ven0000fam0006mod0055:feature:*
alias: cpu:type:x86,ven0000fam0006mod005E:feature:*
alias: cpu:type:x86,ven0000fam0006mod004E:feature:*
alias: cpu:type:x86,ven0000fam0006mod0085:feature:*
alias: cpu:type:x86,ven0000fam0006mod0057:feature:*
alias: cpu:type:x86,ven0000fam0006mod0056:feature:*
alias: cpu:type:x86,ven0000fam0006mod004F:feature:*
alias: cpu:type:x86,ven0000fam0006mod0047:feature:*
alias: cpu:type:x86,ven0000fam0006mod003D:feature:*
alias: cpu:type:x86,ven0000fam0006mod0046:feature:*
alias: cpu:type:x86,ven0000fam0006mod0045:feature:*
alias: cpu:type:x86,ven0000fam0006mod003F:feature:*
alias: cpu:type:x86,ven0000fam0006mod003C:feature:*
alias: cpu:type:x86,ven0000fam0006mod003E:feature:*
alias: cpu:type:x86,ven0000fam0006mod003A:feature:*
alias: cpu:type:x86,ven0000fam0006mod002D:feature:*
alias: cpu:type:x86,ven0000fam0006mod002A:feature:*
depends:
retpoline: Y
intree: Y
name: rapl
vermagic: 5.13.0-28-generic SMP mod_unload modversions
sig_id: PKCS#7
signer: Build time autogenerated kernel key
sig_key: 65:04:EF:DB:22:8E:60:98:46:12:AA:25:C3:1D:F0:FA:DE:9C:5F:68
sig_hashalgo: sha512
signature: 43:C4:06:AF:9D:08:1D:3F:0F:6F:56:DD:20:BE:72:23:5D:D2:2E:98:
06:D6:7F:59:A4:33:5A:07:2F:A3:73:6A:BB:D7:F9:67:60:87:82:75:
92:A1:B0:41:DC:37:D5:BA:B7:A9:44:50:E1:26:47:B8:CA:65:3D:49:
97:62:2A:32:13:4B:22:F2:28:A5:16:19:3D:E6:CD:6D:E1:06:DE:96:
07:A1:FD:37:F9:9F:B3:48:D9:CA:30:40:14:4D:28:D0:E9:56:1C:4A:
1E:02:58:74:76:07:A0:D4:3F:6D:A5:2C:71:19:D4:C1:0A:8B:60:AD:
EB:E5:66:14:43:28:7A:B0:F0:62:E9:93:5B:D9:7D:F7:DE:F0:A5:DA:
7E:F4:07:4C:55:33:1C:E2:C8:62:3E:4C:05:62:CF:E7:CD:43:81:15:
87:27:4B:89:BA:C2:AD:07:AB:43:BA:65:F7:1C:61:9E:C6:B6:56:3D:
3C:CC:CC:ED:61:FE:71:2E:B1:45:4D:FD:98:3E:C3:4A:75:9E:7F:D9:
D8:1F:80:23:FD:C2:20:00:3B:C6:20:41:8D:89:A5:45:C5:AF:EC:63:
EB:C9:06:D4:E2:EE:6D:70:2B:50:CA:CF:03:C5:58:07:A8:AD:F9:5F:
6B:80:CD:90:E8:EF:BD:10:C0:1F:9D:8F:48:A6:F8:52:7B:F5:0B:CB:
D9:8D:0D:B8:1D:17:40:52:AE:DA:90:85:92:F5:2A:65:5E:89:29:F7:
FC:E1:55:E6:88:18:02:89:6A:AA:A2:E1:34:7E:DA:96:50:F4:B1:04:
FE:8E:A1:B2:99:54:20:80:5A:AB:89:AD:A0:77:C6:2F:6F:6B:16:3F:
5D:01:1A:2B:C1:A9:36:3C:13:CA:60:50:48:0E:D7:ED:1D:4A:F3:2F:
65:BD:7C:2D:47:B8:65:EE:3A:54:08:8A:49:5D:EA:78:59:DA:05:F5:
49:C6:A1:F3:ED:B6:F3:65:A0:0B:31:E3:9E:BF:F1:E6:9B:F0:9F:75:
D6:9E:37:DC:61:A8:E9:84:DD:23:FC:BC:E2:42:00:D6:65:A7:6A:18:
BF:8C:67:02:D5:9C:04:15:03:AE:13:47:47:8B:AC:AF:F4:4C:BA:EB:
A9:AC:2E:99:32:A6:A7:29:E7:10:0A:E0:E6:F3:A1:6B:9B:C8:D7:4B:
43:B6:A5:C7:DF:7E:FA:3D:11:26:F8:F7:E4:F4:E9:AA:14:D3:64:43:
4C:CB:9A:DE:09:8B:2B:0D:E7:8A:78:7D:8D:59:F9:42:19:49:2C:14:
CF:30:91:B1:BA:07:36:3D:26:57:7A:6C:2E:F4:C3:61:80:14:02:BD:
DE:16:EB:05:A8:C8:5A:75:06:FC:FF:84
Does it helps to see if the issue is coming from my side?
I think that's another case where rapl support is implemented in powercap (and thus fs access) but not in the perf tool.
If I'm understanding that code correctly (clearly no warranted here !! :), support for rapl is not even implemented in the current source tree, in perf https://github.com/torvalds/linux/blob/555f3d7be91a873114c9656069f1a9fa476ec41a/arch/x86/events/rapl.c#L776
Meanwhile it's been implemented in powercap two years ago : https://github.com/torvalds/linux/blob/0917b95079af82c69d8f5bab301faeebcd2cb3cd/arch/x86/events/msr.c#L89
I think we still need an option for the sensor to read the rapl information through the powercap fs .
Hi, is there any update regarding this issue ? @PierreRustOrange @rouvoy I've tried everything which was already said but i still can't use the hwpc sensor. I am using Ubuntu 22.04 LTS and Linux Kernel 5.15.0-30-generic. When i'm trying to start the sensor it seem it can't access to RAPL_ENERGY_PKG event :
$ docker run --rm --net=host --privileged --pid=host -v /sys:/sys -v /var/lib/docker/containers:/var/lib/docker/containers:ro -v /tmp/powerapi-sensor-reporting:/reporting -v $(pwd):/srv -v $(pwd)/config_file.json:/config_file.json powerapi/hwpc-sensor --config-file srv/config_sensor.json
I: 22-05-24 14:00:50 build: version v1.1.2 (rev: eba2fe195878bae1afadb29fb6da7c4151c890ad) (Jan 21 2022 - 14:54:06)
I: 22-05-24 14:00:50 uname: Linux 5.15.0-30-generic #31-Ubuntu SMP Thu May 5 10:00:34 UTC 2022 x86_64
E: 22-05-24 14:00:50 config: event 'RAPL_ENERGY_PKG' is invalid or unsupported by this machine
E: 22-05-24 14:00:50 config: failed to parse the provided config file
I also get an issue with perf
& sudo perf stat -a -e "power/energy-cores/" /bin/ls
[sudo] password for mbennani:
event syntax error: 'power/energy-cores/'
\___ Cannot find PMU `power'. Missing kernel support?
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Hi, could you please tell us the reference of the cpu you'are using ?
Hi, sorry i forgot to tell i am using a 11th Gen Intel(R) Core(TM) i7-11390H @ 3.40GHz
Hi @PierreRustOrange, I have the same problem @Mbenni. I've tried everything which was already said but i still can't use the hwpc sensor. I am using Ubuntu 20.04.4 LTS, Linux Kernel 5.13.0-41-generic x86_64 and the reference of the cpu 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz × 8. Thank you in advance for your answer.
Hello everyone,
We investigated this issue and it is clear that the Linux kernel (packaged with Ubuntu) does not support energy events access at least for "Tiger Lake" and "Rocket Lake" Intel families via the perf interface. To deploy hwpc_sensor on these families, the current solution requires to modify the kernel (cf. arch/x86/events/rapl.c) and recompile it. If you cannot do that, the best that we can do now is to create a list of supported families with your help. To check if you can access energy events on your host machine, you should run the command perf list | grep power/ and check that the output is not empty.
Hi, Thank you for your response. For me, the output is empty ? Thanks again
Output is empty on my side too
Hello, In that case you have to modify your kernel if you want to use hwpc_sensor.
Hello, Thank you once again for your answer! I modified the kernel, now it works. I have a small question :). The measurements with smartwatts are watts or milliwatts? because I have weird values on Grafana of the order of 1000000? Thanks,
Hi @BZConserto Could you please tell us what did you modify in the kernel ? I am only student but this would help me a lot in my research. Thank you.
Hi @Mbenni I only have modified the linux kernel. Before I had 5.13.0-41, now, I installed 5.10.0-14. I hope its help you.
Hello, Thank you once again for your answer! I modified the kernel, now it works. I have a small question :). The measurements with smartwatts are watts or milliwatts? because I have weird values on Grafana of the order of 1000000? Thanks,
Hello, measurements are in watts.
Hello everyone, from i5 13600k with Ubuntu 22.04.3 LTS and kernel as 5.10.0-051000-generic giving:
sudo docker run --rm \
--net=host \
--privileged \
--pid=host \
-v /sys:/sys \
-v /var/lib/docker/containers:/var/lib/docker/containers:ro \
-v /tmp/powerapi-sensor-reporting:/reporting \
-v $(pwd):/srv \
powerapi/hwpc-sensor \
-n "$(hostname -f)" \
-r "mongodb" -U "mongodb://127.0.0.1" -D "test" -C "prep" \
-s "rapl" -o -e "RAPL_ENERGY_PKG" \
-s "msr" -e "TSC" -e "APERF" -e "MPERF" \
-c "core" -e "CPU_CLK_UNHALTED:REF_P" -e "CPU_CLK_UNHALTED:THREAD_P" -e "LLC_MISSES" -e "INSTRUCTIONS_RETIRED"
I'm getting this output.
I: 23-11-18 22:41:43 build: version unknown (rev: unknown)
I: 23-11-18 22:41:43 uname: Linux 5.10.0-051000-generic #202012132330 SMP Sun Dec 13 23:33:36 UTC 2020 x86_64
I: 23-11-18 22:41:43 pmu: found ix86arch 'Intel X86 architectural PMU' having 7 events, 9 counters (6 general, 3 fixed)
I: 23-11-18 22:41:43 pmu: found perf 'perf_events generic PMU' having 184 events, 0 counters (0 general, 0 fixed)
I: 23-11-18 22:41:43 pmu: found perf_raw 'perf_events raw PMU' having 1 events, 0 counters (0 general, 0 fixed)
I: 23-11-18 22:41:43 pmu: found intel_msr 'Intel MSR' having 6 events, 6 counters (0 general, 6 fixed)
E: 23-11-18 22:41:43 config: event 'RAPL_ENERGY_PKG' is invalid or unsupported by this machine
E: 23-11-18 22:41:43 config: failed to parse the provided command-line arguments
What do u suggest me to do? I have already downgraded the kernel to 5.10 as suggested above but still not working. I need RAPL energy for my studies.
Hi,
Unfortunately, currently the Linux Kernel does not support energy events access for your "Raptor Lake" Intel Processor. We are working in a new Formula based on procfs
that will allow the usage of PowerAPI with this kind of processors.
On some system, the sensor fails to access RAPL counters and we get this error at startup:
However, on the same systems, we can see rapl data in the powercap sysfs.
powerapi-ng/powerapi#125 is probably an example of such error.
Actually the sensor use the perf subsystem to access rapl, which is implemented in a different part of the kernel source tree than powercap. Thus I suspect that this can happens when the kernel contains, for the cpu of the machine, the implementation of powercap but not of rapl access in perf.
I suggest we implement a fallback access to RAPL using powercap sysfs, when we cannot use perf.