Proposal: Go-based Model Server

sustainable-computing-io / kepler

Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe performance counters and other system stats, use ML models to estimate workload energy consumption based on these stats, and exports them as Prometheus metrics

Apache License 2.0

1.17k stars 182 forks source link

What would you like to be added?

Currently we have this project https://github.com/sustainable-computing-io/kepler-model-server based in Python that does many things....

Some of that belongs in Python - i.e training or anything that uses numpy - however there are elements of this codebase that would be useful to have in Go form.

What I would propose is:

Create an OpenAPI spec for the API used for model server
Create an pkg/model-server-api that implements this API
Generate pkg/model-server-client from the OpenAPI spec - this would be used by the pkg/model inferencing code.
Create pkg/model-db - which handles interactions with kepler model db
Create cmd/model-sever the actual binary that serves the API

This would then leave the functionality of the estimator and online-trainer in Python since the model pipelines should not need to change. These can either remain in Python, and the REST API from the model server can be adjusted appropriately.

We can call them using Cython from the Go code 🤯 See: https://poweruser.blog/embedding-python-in-go-338c0399f3d5

Why is this needed?

The in-tree models that are used in Kepler can be removed 🎉 cmd/exporter can use pkg/kepler-model-db to download the latest models if none are in the correct path
CPU detection logic in Kepler can be used more easily to select the correct model
Type safe API bindings mean less bugs
We can more easily integration test these 2 binaries
Easier packaging and shipping of the model server
Code sharing and re-use between kepler-exporter standalone and kepler-exporter + kepler-model-server

The in-tree models that are used in Kepler can be removed 🎉 cmd/exporter can use pkg/kepler-model-db to download the latest models if none are in the correct path

We have had a discussion about this in the community call and I think we actually can achieve this already by

removing the models from the image
Using a configmap to embed the models

IIUC, the idea behind adding this model is to support the usecase of running kepler (without estimator or model-server) on a VM without needing access to any external network.

(NOTE: all the point below are based on my limited understanding on models, training and selection. @sunya-ch please correct me if I am wrong :)

To take advantage of model-db, you will need the estimator sidecar (numpy / scikit).

For the rest of the points, I definitely see a small advantage (in terms of performance) in having a model server written in go to pick the best model and to serve them. However, models are served only once per kepler so I am not really sure if rewrite benefits us at this point in time.

Also, the model selection logic should go hand in hand with the training part, i.e. changes in metadata or features should be incorporated in the best - model selection - https://github.com/sustainable-computing-io/kepler-model-server/blob/f6990f3c0afe7320af90e47e9e91819f397b7b32/src/server/model_server.py#L73 . And for that I think it is best to have that in python itself.

sustainable-computing-io / kepler

Proposal: Go-based Model Server #1657

What would you like to be added?

Why is this needed?