Open lianhao opened 5 months ago
It seems the trained power model using the CPU time metric exported by Kepler before v0.7 (bpf_cpu_time_us); however, the estimation is called by the new Kepler (with bpf_cpu_time_ms). You may have to retrain the power model with new Kepler version.
I0529 02:05:34.744887 690634 utils.go:86] Available ebpf counters: [bpf_page_cache_hit task_clock_ms bpf_cpu_time_ms bpf_net_tx_irq bpf_net_rx_irq bpf_block_irq cpu_cycles cpu_instructions cache_miss]
...
I0529 02:05:39.914526 690634 estimate.go:139] estimator unmarshal error: json: cannot unmarshal array into Go struct field ComponentPowerResponse.powers of type map[string][]float64 ({"powers": [], "msg": "\"None of [Index(['bpf_cpu_time_us'], dtype='object')] are in the [columns]\"\n"})
What happened?
When running the kepler in K8S with the pretrained model to estimate the process power, kepler pod just go panics after launch.
The models are trained by following kepler model server tekton training process, using the complete run.
Kepler container goes into error just after it started:
There are some errors in kepler-estimator container too:
The complete kepler log can be found here : kepler.log The complete kepler-estimator log can be found here: kepler-estimator.log
What did you expect to happen?
Kepler should be run without any panics
How can we reproduce it (as minimally and precisely as possible)?
run kepler with the kepler deployment configurations below.
Anything else we need to know?
No response
Kepler image tag
Kubernetes version
Cloud provider or bare metal
OS version
Install tools
Kepler deployment config
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)