Open vprashar2929 opened 5 months ago
I know why this is 🎉 See: https://github.com/sustainable-computing-io/kepler/blob/main/bpfassets/libbpf/src/kepler.bpf.c#L247C3-L247C23
As @vimalk78 found out, from eBPF we record the:
From the perspective of userland, the PID is actually what the kernel calls the TGID - you'll notice that we accidentally on-purpose switch the order of these fields in the definition of the struct: https://github.com/sustainable-computing-io/kepler/blob/main/pkg/bpf/types.go#L49-L50
TL:DR the comm
that we record belongs to the pid
(as the kernel sees it, not as userland sees it), so you will indeed get values like CPU 0/KVM
.
I think the fix required here is going to be either:
comm
from eBPF and look it up from procfs insteadcomm
if pid == tgid
I'm going to try and verify this theory on my development machine at some point later this week.
@vprashar2929 is this still an issue?
Ref: https://github.com/sustainable-computing-io/kepler/issues/1640
closing as the issue is addressed and fixed
reopening the issue as Kepler latest still reports the process name as incorrect:
what is expected process name in above test?
❯ pstree -p | grep qemu
|-qemu-system-x86(110356)-+-{qemu-system-x86}(110367)
| |-{qemu-system-x86}(110370)
| |-{qemu-system-x86}(110371)
| |-{qemu-system-x86}(110372)
| |-{qemu-system-x86}(110373)
| |-{qemu-system-x86}(110374)
| |-{qemu-system-x86}(110375)
| |-{qemu-system-x86}(110377)
| `-{qemu-system-x86}(2178213)
What happened?
When Kepler using the
latest
deployed on a machine currently it reports the wrong process name in the exported metrics.Attaching some screenshots for reference:
Output from pstree command:
kepler_process_platform_joules_total
for the particular pid75577
that iscommand="CPU 0/KVM"
which is wrongWhat did you expect to happen?
Kepler should report the correct command name in the metrics that it exports.
How can we reproduce it (as minimally and precisely as possible)?
Run Kepler either on Kubernetes or using the docker-compose locally which is present here: https://github.com/sustainable-computing-io/kepler/tree/main/hackdocker-compose
Anything else we need to know?
No response
Kepler image tag
Kubernetes version
Cloud provider or bare metal
OS version
Install tools
Kepler deployment config
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)