karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
37.55k stars 5.99k forks source link

fix(train.py): mfu estimation to respect CPU-GPU sync point #527

Open JasonLiJT opened 5 months ago

JasonLiJT commented 5 months ago

Previously, the mfu timing measurement was taken before the CPU-GPU sync point at every iter. The resulting running_mfu:

See diagrams below.

log_interval = 1

mfu drawio

log_interval = 2

Note that t3 - t2 is discarded. Only t2 - t1 and t4 - t3 etc contribute to running_mfu. mfu2 drawio