sustainable-computing-io / kepler

Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe performance counters and other system stats, use ML models to estimate workload energy consumption based on these stats, and exports them as Prometheus metrics
https://sustainable-computing.io
Apache License 2.0
1.1k stars 174 forks source link

Kepler latest fails to create Power Model to estimate Node Platform/Component Power on VM #1663

Open vprashar2929 opened 1 month ago

vprashar2929 commented 1 month ago

What happened?

When Kepler latest is deployed along with the estimator and model-server(release-0.7.11) on the VM, Kepler is unable to unmarshal array while creating Power Model for Platform and Component power.

Below are the logs from Kepler for reference:

kepler-1  | I0802 07:49:04.992245   36250 model.go:95] Using Power Model Ratio
kepler-1  | I0802 07:49:04.992250   36250 process_energy.go:124] Using the Ratio/DynPower Power Model to estimate Process Component Power
kepler-1  | I0802 07:49:04.992256   36250 process_energy.go:125] Process feature names: [bpf_cpu_time_ms bpf_cpu_time_ms bpf_cpu_time_ms   gpu_compute_util]
kepler-1  | I0802 07:49:04.992264   36250 model.go:178] Model Config NODE_TOTAL: {ModelType:EstimatorSidecar ModelOutputType:AbsPower TrainerName:SGDRegressorTrainer EnergySou
rce:acpi SelectFilter: InitModelURL:https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/v0.7/specpower/acpi/AbsPower/BPFOnly/GradientBoostin
gRegressorTrainer_0.zip InitModelFilepath: IsNodePowerModel:true ProcessFeatureNames:[] NodeFeatureNames:[] SystemMetaDataFeatureNames:[] SystemMetaDataFeatureValues:[]}
kepler-1  | I0802 07:49:05.993525   36250 estimate.go:139] estimator unmarshal error: json: cannot unmarshal array into Go struct field ComponentPowerResponse.powers of type m
ap[string][]float64 ({"powers": [], "msg": "'NoneType' object has no attribute 'predict'\n"})
kepler-1  | I0802 07:49:05.993905   36250 node_platform_energy.go:54] Failed to create EstimatorSidecar/AbsPower Power Model to estimate Node Platform Power: json: cannot unma
rshal array into Go struct field ComponentPowerResponse.powers of type map[string][]float64
kepler-1  | I0802 07:49:05.993944   36250 model.go:178] Model Config NODE_COMPONENTS: {ModelType:EstimatorSidecar ModelOutputType:AbsPower TrainerName:SGDRegressorTrainer Ener
gySource:intel_rapl SelectFilter: InitModelURL:https://raw.githubusercontent.com/sustainable-computing-io/kepler-model-db/main/models/v0.7/ec2-0.7.11/rapl-sysfs/AbsPower/BPFOn
ly/GradientBoostingRegressorTrainer_0.zip InitModelFilepath: IsNodePowerModel:true ProcessFeatureNames:[] NodeFeatureNames:[] SystemMetaDataFeatureNames:[] SystemMetaDataFeatu
reValues:[]}
kepler-1  | I0802 07:49:06.384985   36250 estimate.go:139] estimator unmarshal error: json: cannot unmarshal array into Go struct field ComponentPowerResponse.powers of type m
ap[string][]float64 ({"powers": [], "msg": "'NoneType' object has no attribute 'predict'\n"})
kepler-1  | I0802 07:49:06.385064   36250 node_component_energy.go:58] Failed to create EstimatorSidecar/AbsPower Power Model to estimate Node Component Power: json: cannot un
marshal array into Go struct field ComponentPowerResponse.powers of type map[string][]float64

What did you expect to happen?

Kepler should be able to use the latest models to estimate Platform and Component power on the VM

How can we reproduce it (as minimally and precisely as possible)?

Deploy Kepler on VM using vm compose manifests with following updations:

Anything else we need to know?

No response

Kepler image tag

latest

Kubernetes version

```console $ kubectl version # paste output here ```

Cloud provider or bare metal

VM

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

Kepler deployment config

For on kubernetes: ```console $ KEPLER_NAMESPACE=kepler # provide kepler configmap $ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE} # paste output here # provide kepler deployment description $ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE} ``` For standalone: # put your Kepler command argument here

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

vprashar2929 commented 1 month ago

cc: @sunya-ch