ml-energy / zeus

Deep Learning Energy Measurement and Optimization
https://ml.energy/zeus
Apache License 2.0
180 stars 24 forks source link

Allow enabling `cudnn.benchmark` by postponing Zeus profiling #5

Closed Rosie-m closed 11 months ago

Rosie-m commented 1 year ago

Is your feature request related to a problem? Please describe. Currently, Zeus starts profiling from the very beginning of the first epoch. However, if the user sets cudnn.benchmark = True, the first few iterations will get slower because of the profiling and benchmarking. Thus, Zeus's profiling will be ruined.

Describe the solution you'd like We can postpone the start of profiling by skipping several iterations from the beginning of the first epoch. This "number of iterations" has to be enough to cover CuDNN's benchmarking time. This will require a look into CuDNN's benchmarking as well.

jaywonchung commented 11 months ago

Completed in 8952979.