mlco2 / codecarbon

Track emissions from Compute and recommend ways to reduce their impact on the environment.
https://mlco2.github.io/codecarbon
MIT License
1.12k stars 173 forks source link

Feature request: adding a dedicated setup for inference time #354

Closed SaboniAmine closed 1 year ago

SaboniAmine commented 1 year ago

Description

To make the tracker more relevant for ML carbon emissions estimation, it should also be used at inference time. Currently, the cc can track the execution of a whole inference program, or be set in monitoring mode but the measured electrical consumption will be mixed with operations parallel to the inference, such as maintaining the web server exposing the model to the network operations, for example.

In order to estimate the impact of the sole model, the tracker should isolate the electrical consumption by inference, which is currently unachievable as the tracker doesn’t know how it occurs. We propose then to add to the tracker new ways to measure electrical consumption, not only through the time axis, to be able to estimate :

Additionally, we may want to compare those values for different models, or different hardware setups. This configuration should be stored along the measurement data. Currently, this is stored in the API, using the Run abstraction, but this might be interesting to have it locally also.

What we propose

If we try to model the impact it could have on a generic inference class, it could be abstracted to this :


InferenceClass:
    def __init__(self, model_to_load):
        self.model = load_model(model_to_load)

    def predict(sample)
        return self.model.predict(sample)

The evolution needed implies that we could segregate the init phase consumption, segregate each predict execution and store them in a way that it could be interpreted correctly when analyzing the data. The current emission.csv file structure does not seem to be adequate for post-hoc analysis.

To breakdown the implementation work, we might divide it in :

Further issues can appear, depending on the inference setup (especially for distributed environments, or multi-processed inference web servers), which can alter the validity of the measured data. We are looking for production setup descriptions, to help us identify those potential issues.

benoit-cty commented 1 year ago

It seems possible to get the energy of the GPU instead of the power : it will allow finest estimation. See NVIDIA API documentation.