mlfoundations / open_lm

A repository for research on medium sized language models.
MIT License
321 stars 41 forks source link

Adding averaging of iterates #238

Closed Tomerporian closed 2 months ago

Tomerporian commented 2 months ago

Supports EMA and polynomial averaging. Includes option for more than one averaged model and taking intermittent averaging. For example, running EMA with parameter 0.999 and period 100 and polynomial averaging with parameter 8 and period 10, use --averagers="ema_0.999_100,poly_8_10" Logging of training loss of the averaged models can be done by choosing --log-avg-model-training-loss to be bigger than zero.