sustainable-computing-io / kepler

Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe performance counters and other system stats, use ML models to estimate workload energy consumption based on these stats, and exports them as Prometheus metrics
https://sustainable-computing.io
Apache License 2.0
1.17k stars 182 forks source link

Update DynPower Model to 0.7.11 #1748

Open sthaha opened 2 months ago

sthaha commented 2 months ago

https://github.com/sustainable-computing-io/kepler/pull/1728 wasn't able to update the intel_rapl_DynPower model since the model is missing in model-db (see: https://github.com/sustainable-computing-io/kepler-model-db/issues/27).

The task is to update the models when the linked bug is fixed.

sunya-ch commented 2 months ago

We set upper bound of MAE to 10 and MAPE to 20%. The SGDTrainer is much worse due to sparse data collected from latest Kepler as below.

MAE =  65.88490729951363
         MAE          MSE       MAPE    n energy_component energy_source                  Model Feature Group
0  65.884907  5580.690108  55.722063  172          package    rapl-sysfs  SGDRegressorTrainer_0       BPFOnly
1   0.000000     0.000000  -1.000000  172             core    rapl-sysfs  SGDRegressorTrainer_0       BPFOnly
2   0.000000     0.000000  -1.000000  172           uncore    rapl-sysfs  SGDRegressorTrainer_0       BPFOnly
3   2.876924    11.362931  54.496488  172             dram    rapl-sysfs  SGDRegressorTrainer_0       BPFOnly

estimate_dyn_default_min_rapl-sysfs_BPFOnly_SGDRegressorTrainer_0 using BPFOnly_corr

I can upload the weight for DynPower with the remark of this potential model error.

vprashar2929 commented 1 month ago

@sunya-ch Are we good to close this?