Feature request: adding a dedicated setup for inference time

Description

To make the tracker more relevant for ML carbon emissions estimation, it should also be used at inference time. Currently, the cc can track the execution of a whole inference program, or be set in monitoring mode but the measured electrical consumption will be mixed with operations parallel to the inference, such as maintaining the web server exposing the model to the network operations, for example.

In order to estimate the impact of the sole model, the tracker should isolate the electrical consumption by inference, which is currently unachievable as the tracker doesn’t know how it occurs. We propose then to add to the tracker new ways to measure electrical consumption, not only through the time axis, to be able to estimate :

Impact of loading the model, at serving time, on the electrical consumption
Impact of inference, at inference level

Additionally, we may want to compare those values for different models, or different hardware setups. This configuration should be stored along the measurement data. Currently, this is stored in the API, using the Run abstraction, but this might be interesting to have it locally also.

What we propose

If we try to model the impact it could have on a generic inference class, it could be abstracted to this :


InferenceClass:
    def __init__(self, model_to_load):
        self.model = load_model(model_to_load)

    def predict(sample)
        return self.model.predict(sample)

The evolution needed implies that we could segregate the init phase consumption, segregate each predict execution and store them in a way that it could be interpreted correctly when analyzing the data. The current emission.csv file structure does not seem to be adequate for post-hoc analysis.

To breakdown the implementation work, we might divide it in :

Adapting the current tracker (or create a new one) to make it show the task which has been measured (training / evaluation, model loading, inference time, other purpose ?)
Make the emission data point reflect which axis it is following (duration or task oriented). The data persistence might involve a new format.
Store the new data structure in a separate file, which can be related to a run configuration. This run configuration should be also persisted to enable comparisons.

Further issues can appear, depending on the inference setup (especially for distributed environments, or multi-processed inference web servers), which can alter the validity of the measured data. We are looking for production setup descriptions, to help us identify those potential issues.

mlco2 / codecarbon

Feature request: adding a dedicated setup for inference time #354

Description

What we propose