ml-energy / zeus

Deep Learning Energy Measurement and Optimization
https://ml.energy/zeus
Apache License 2.0
180 stars 24 forks source link

Fix: Add static checking to prevent incomplete profiling after removing cutoff #7

Closed Rosie-m closed 1 year ago

Rosie-m commented 1 year ago

Description

This PR fixes the bug that the profile window starts unconditionally if there is any iteration left. Since we have removed the cutoff in #2, this will cause the whole profiling process to stop working properly, because the guard for entering a profile window will never be True.

Background

This PR is a follow-up of #2. In #2, we switched to iteration-based accounting for the profile window. Accordingly, we move from "dynamically check and cutoff" to "statically check and scale one profile window to fit in one epoch". But removing the cutoff means another mechanism has to be taken for preventing entering the incomplete profile window.

Solution

Since we know whether a profile window will be incomplete by simply comparing the current iteration number (self.sample_num) plus a profile window (self.warmup_iter + self.profile_iter) with the total number of iterations in one epoch (self.num_samples). We can prevent entering an incomplete profile window and just doing nothing until the next epoch.

Main code changes

In dataloader.py > ZeusDataLoader > __next__(), add the static check described above before calling _start_warmup().

Tests passed