sustainable-computing-io / kepler

Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe performance counters and other system stats, use ML models to estimate workload energy consumption based on these stats, and exports them as Prometheus metrics
https://sustainable-computing.io
Apache License 2.0
1.17k stars 184 forks source link

kepler_node_info reports UNKNOWN cpu_architecture on RHEL9/arm64 #1347

Open jharriga opened 7 months ago

jharriga commented 7 months ago

What happened?

On RHEL9/arm64 system 'kepler_node_info' incorrectly reports “cpu_architecture”

kepler_node_info{components_power_source="ampere-xgene-hwmon",cpu_architecture="unknown",platform_power_source="none",source="os"} 1

What did you expect to happen?

cpu_architecture indicates the systems cpu arch

lscpu | grep Model

Model name: Neoverse-N1 BIOS Model name: Ampere(R) Altra(R) Max Processor

How can we reproduce it (as minimally and precisely as possible)?

Download & install rpm start service root# systemctl start container-kepler --now root# curl localhost:8888/metrics | grep kepler_node_info

Anything else we need to know?

No response

Kepler image tag

v0.7.9

Kubernetes version

NONE

Cloud provider or bare metal

bare metal

OS version

# On Linux: $ cat /etc/os-release Red Hat Enterprise Linux 9.3 (Plow) $ uname -a Linux perf-arm-11.perf.eng.bos2.dc.redhat.com 5.14.0-362.21.1.el9_3.aarch64 #1 SMP PREEMPT_DYNAMIC Thu Jan 25 08:27:11 EST 2024 aarch64 aarch64 aarch64 GNU/Linux

Install tools

# rpm --version RPM version 4.16.1.3

Kepler deployment config

For standalone: root# systemctl start container-kepler --now root# curl localhost:8888/metrics | grep kepler_node_info

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

rootfs commented 5 months ago

current cpu architectures are from cpu_id, only working for x86. We need a sysfs id to get for ARM.

jharriga commented 3 months ago

Seeing same behavior with kepler v0.7.11 on RHEL9/ARM64 Ampere system

[root]# lscpu | grep Model Model name: Neoverse-N1 BIOS Model name: Ampere(R) Altra(R) Max Processor

[root]# uname -a Linux perf-arm-11.perf.eng.bos2.dc.redhat.com 5.14.0-362.21.1.el9_3.aarch64 #1 SMP PREEMPT_DYNAMIC Thu Jan 25 08:27:11 EST 2024 aarch64 aarch64 aarch64 GNU/Linux

[root]# curl localhost:8888/metrics|grep kepler_node_info

kepler_node_info{components_power_source="ampere-xgene-hwmon",cpu_architecture="unknown",platform_power_source="none",source="os"} 1

rootfs commented 1 month ago

this issue turns out still there in recent code