Open JasonLiJT opened 5 months ago
Previously, the mfu timing measurement was taken before the CPU-GPU sync point at every iter. The resulting running_mfu:
running_mfu
log_interval = 1
log_interval > 1
log_interval
See diagrams below.
Note that t3 - t2 is discarded. Only t2 - t1 and t4 - t3 etc contribute to running_mfu.
t3 - t2
t2 - t1
t4 - t3
Previously, the mfu timing measurement was taken before the CPU-GPU sync point at every iter. The resulting
running_mfu
:log_interval = 1
.log_interval > 1
.log_interval
speeds up training (it usually does not).See diagrams below.
log_interval
= 1log_interval
= 2Note that
t3 - t2
is discarded. Onlyt2 - t1
andt4 - t3
etc contribute torunning_mfu
.