sustainable-computing-io / kepler

Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe performance counters and other system stats, use ML models to estimate workload energy consumption based on these stats, and exports them as Prometheus metrics
https://sustainable-computing.io
Apache License 2.0
1.17k stars 182 forks source link

Proposal: Go-based Model Server #1657

Open dave-tucker opened 3 months ago

dave-tucker commented 3 months ago

What would you like to be added?

Currently we have this project https://github.com/sustainable-computing-io/kepler-model-server based in Python that does many things....

Some of that belongs in Python - i.e training or anything that uses numpy - however there are elements of this codebase that would be useful to have in Go form.

What I would propose is:

  1. Create an OpenAPI spec for the API used for model server
  2. Create an pkg/model-server-api that implements this API
  3. Generate pkg/model-server-client from the OpenAPI spec - this would be used by the pkg/model inferencing code.
  4. Create pkg/model-db - which handles interactions with kepler model db
  5. Create cmd/model-sever the actual binary that serves the API

This would then leave the functionality of the estimator and online-trainer in Python since the model pipelines should not need to change. These can either remain in Python, and the REST API from the model server can be adjusted appropriately.

OR

We can call them using Cython from the Go code šŸ¤Æ See: https://poweruser.blog/embedding-python-in-go-338c0399f3d5

Why is this needed?

sthaha commented 3 months ago

The in-tree models that are used in Kepler can be removed šŸŽ‰ cmd/exporter can use pkg/kepler-model-db to download the latest models if none are in the correct path

We have had a discussion about this in the community call and I think we actually can achieve this already by

IIUC, the idea behind adding this model is to support the usecase of running kepler (without estimator or model-server) on a VM without needing access to any external network.

(NOTE: all the point below are based on my limited understanding on models, training and selection. @sunya-ch please correct me if I am wrong :)

To take advantage of model-db, you will need the estimator sidecar (numpy / scikit).

For the rest of the points, I definitely see a small advantage (in terms of performance) in having a model server written in go to pick the best model and to serve them. However, models are served only once per kepler so I am not really sure if rewrite benefits us at this point in time.

Also, the model selection logic should go hand in hand with the training part, i.e. changes in metadata or features should be incorporated in the best - model selection - https://github.com/sustainable-computing-io/kepler-model-server/blob/f6990f3c0afe7320af90e47e9e91819f397b7b32/src/server/model_server.py#L73 . And for that I think it is best to have that in python itself.