ml-energy / zeus

Deep Learning Energy Measurement and Optimization
https://ml.energy/zeus
Apache License 2.0
200 stars 25 forks source link

`ZeusMonitorContext` for energy profiling inside training scripts #9

Closed jaywonchung closed 1 year ago

jaywonchung commented 1 year ago

This PR implements zeus.monitor.ZeusMonitorContext, which is intended to be used by DNN training scripts to profile their per-iteration energy and time consumption.

jaywonchung commented 1 year ago

@Rosie-m is the main reviewer for this PR, but I'd like @zyang37 to also take a look lightly to get a feel for the team's workflow.

jaywonchung commented 1 year ago

Along with changes, I implemented a caching mechanism for metrics that basically stores energy/time numbers inside self._metric_cache, and the cache dictionary is reset by zeus_ctx.reset(). So the heavy pandas computation will only run once even if the user accesses zeus_ctx.total_energy multiple times.