pytorch / torchdynamo

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
BSD 3-Clause "New" or "Revised" License
1.01k stars 123 forks source link

Loss Convergence Benchmark #713

Closed anijain2305 closed 2 years ago

anijain2305 commented 2 years ago

This point has been raised multiple times. Our benchmarks today only focus on single iteration of fwd-bwd (and optimizers soon). But we lack a real training example, with loss going down.

We could start with Bert model and run for say 500 iterations - with both pre-training and fine-tuning batch sizes. Nvidia folks mentioned that this could reveal many bugs which are not visible in single iteration.

yanboliang commented 2 years ago

Progress tracker: