sustainable-computing-io / kepler

Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe performance counters and other system stats, use ML models to estimate workload energy consumption based on these stats, and exports them as Prometheus metrics
https://sustainable-computing.io
Apache License 2.0
1.06k stars 169 forks source link

Idle power analysis #646

Open rootfs opened 1 year ago

rootfs commented 1 year ago

1) capture idle power model characteristics 2) attribute idle power to processes in a fair and accurate manner (evenly divided/resource request weighted)

0.5 release

jichenjc commented 1 year ago

@marceloamaral is the issue aiming to solve the idle+active+dyn energy in the original article so the active energy will be added? or something new? Thanks

marceloamaral commented 1 year ago

@jichenjc this is a little bit different but also related.

According to the GHG protocol recommendation, the allocation of constant power should be based on the application size, while dynamic power should be based on resource utilization.

There are two types of constant power: idle and activation power. Although activation power is technically dynamic, it remains constant and is triggered when the first processes access the resource and does not change with resource utilization. For example the power consumption of the first process accessing a idle processor socket is much higher than the incremental power of adding new processes.

Currently, the idle power is evenly distributed among all processes, regardless of the application size. However, we can enhance the model by splitting idle power based on container resource requests. Furthermore, we have not yet implemented the calculation of the activation power.

jichenjc commented 1 year ago

Currently, the idle power is evenly distributed among all processes, regardless of the application size. However, we can enhance the model by splitting idle power based on container resource requests. Furthermore, we have not yet implemented the calculation of the activation power.

ok, so this might be the key: However, we can enhance the model by splitting idle power based on container resource requests , I think here we may request/limit in the container/pod spec, but whether it can reflect the idle power ? e.g from request =100 and limit =400 is not an uncommon way to setup ,use 100 seems a little bit unfair?

marceloamaral commented 1 year ago

Right, just to simplify we should use the ratio of limits for each resource (CPU, DRAM, GPU, OTHER and PLATFORM): container_idle_power = (container_limit/sum_of_all_containers_limit) * idle_power

rootfs commented 1 year ago

Based GHG protocool, resource utilization based method is used. If resource limit is not found, min (reserved)/max/average usage across all applications is then used or use ratio approach.

@marceloamaral will start from reservation method first.

rootfs commented 1 year ago

Another approach is to use kernel task state to identify idle and running tasks during context switch.

The idle task has its own state:

#define TASK_IDLE       (TASK_UNINTERRUPTIBLE  | TASK_NOLOAD)

So is the running task:

#define task_is_running(task)       (READ_ONCE((task)->__state) == TASK_RUNNING)

Our ebpf collector should check the process state and count resource usages separately so that kepler can break energy usage down to idle and running states.

marceloamaral commented 1 year ago

I am currently exploring an alternative approach that involves utilizing regression analysis to estimate energy consumption at minimal resource utilization, thereby obtaining the idle power.

To accomplish this, we will gather relevant data and construct a regression model. By setting the resource utilization to a significantly low value, we can accurately determine the corresponding idle power consumption.

stale[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

marceloamaral commented 10 months ago

Keeping it active, I will work on that.

stale[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rootfs commented 8 months ago

keep alive

stale[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

marceloamaral commented 5 months ago

Idle Power discussion #1208