sustainable-computing-io / kepler

Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe performance counters and other system stats, use ML models to estimate workload energy consumption based on these stats, and exports them as Prometheus metrics
https://sustainable-computing.io
Apache License 2.0
1.13k stars 177 forks source link

kepler_container_joules_total stays costant over time #1567

Open AntonioDiTuri opened 3 months ago

AntonioDiTuri commented 3 months ago

What happened?

I installed kepler on a local kind cluster following the documentation, specifically using the make cluster-up command. I installed the last version of Kepler: release-0.7.10. My hardware is Mac with intel processor.

I then run a simple python code to check what was the consumption for demo purpose.

This is the simple python code:

# inefficient_fibonacci.py
import os

def inefficient_fibonacci(n):
  if n <= 1:
    return n
  return inefficient_fibonacci(n-1) + inefficient_fibonacci(n-2)

def calculate_fibonacci(n):
  return inefficient_fibonacci(n)

if __name__ == "__main__":
  n = os.getenv("INPUT")
  result = calculate_fibonacci(int(n))
  print(f"Fibonacci of {n} is: {result}")

This is the dockerfile:

FROM python:latest
WORKDIR /app
COPY fibonacci-infeccient.py ./app.py
CMD ["python", "app.py"]

And this is the simple pod.yaml I used to deploy:

apiVersion: v1
kind: Pod
metadata:
  name: fibonacci-inefficient
spec:
  containers:
  - name: fibonacci-inefficient
    image: fibonacci-inefficient:latest
    imagePullPolicy: Never
    env:
    - name: INPUT
      value: "995" # a high value to observe the behavior

What did you expect to happen?

When querying in grafana the simple metric: namely kepler_container_joules_total The value of the metric does not increase over time. It stays costant, the queries with mode=idle and mode=dynamic return different numbers but still costant.

Since the container is always on and uses constantly the following resources:

k top pods
NAME                    CPU(cores)   MEMORY(bytes)
fibonacci-inefficient   1000m        4Mi

I would have expected the energy value to rise. But that is not the case. Why? I am attaching a screen of the grafana:

Screenshot 2024-06-25 at 22 07 31

[EDIT] from the screenshot it might seem that I observed this only for few minutes and I did not give enough time to the power models to "trigger a change" in the output. However observing the process for 1h also made no difference, the energy stays constant

Am I missing something?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

No response

Kepler image tag

release-0.7.10.

Kubernetes version

```console $ kubectl version 1.30.0 ```

Cloud provider or bare metal

Bare metal - Apple wit intel chip

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

Kind cluster local

Kepler deployment config

For on kubernetes: ```console $ KEPLER_NAMESPACE=kepler # provide kepler configmap $ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE} # paste output here # provide kepler deployment description $ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE} ``` For standalone: # put your Kepler command argument here

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

dave-tucker commented 2 months ago

Hi @AntonioDiTuri , thanks for the bug report. I've tried to reproduce this on my kind cluster but I've not been able to do so. Only difference is that I'm on Linux:

Screenshot 2024-07-08 at 17 13 16

Will try again on macOS and see if I can get a reproducer

dave-tucker commented 2 months ago

@AntonioDiTuri I've now reproduced this 🎉

When deploying kepler-exporter into a VM environment where hardware perf events are not available we fall back to using bpf_cpu_time_ms as input into the model for estimation.

However, this appears to be incorrectly accounted in the situation described above:

macos via podman desktop (basically a vm) - no hardware perf event support:

Screenshot 2024-07-09 at 12 16 25

linux via kind - hardware perf event support

Screenshot 2024-07-09 at 12 17 18

I'll start looking in to this and I hope to get a fix ready soon.

sthaha commented 2 months ago

I am guessing that this happen because of the default model packaged in kepler image. cc: @sunya-ch

@dave-tucker Do you get the same result if you used the vm docker-compose ? The compose deploys both estimator sidecar and model-server which downloads better model.

dave-tucker commented 2 months ago

@sthaha I don't think its the model... I think it's the input into the model. Per the screenshots above bpf_cpu_time_ms remains flat over time in one environment vs the other (correct) example where it increases over time

I can try compose to prove this

sthaha commented 2 months ago

@dave-tucker you are right, I totally missed the comment - no hardware perf event support: on flat bpf time graph 🙄 .

sthaha commented 2 months ago

To find if this is a regression (to stop the release of 0.7.11) I have been testing this on OSX first using 0.7.10 and then using the latest

On 0.7.10

I found that there are a few containers that have rate of change of bpf time as 0

image

There are few that are reported has not having consumed any/ 0 CPU time (so this isn't a regression per se)

image

While those that have consumed some are minimal.

image

... to be continued

dave-tucker commented 2 months ago

@sthaha I've had mixed results reproducing this. What I did see though is that when we have no perf events enabled we just use bpf_cpu_time_ms in the model. Looking at the eBPF probe code in isolation I see that it reliably calculates this value and updates the maps.

Given that prometheus scraping still occurs AND the value remains constant it would seem that the main loop of Kepler is deadlocked somewhere - as IIRC that's what updates the prometheus metrics every 3 seconds. I've not been able to reproduce this with debug logging on to confirm but ☝️ appears to be the most likely cause.