sangmichaelxie / doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
https://arxiv.org/abs/2305.10429
MIT License
286 stars 32 forks source link

about loss #3

Closed ywb2018 closed 1 year ago

ywb2018 commented 1 year ago

help please.
1、why excess loss do not follow the paper: max(excess-loss, 0)。

sangmichaelxie commented 1 year ago

We do clip the excess loss here: https://github.com/sangmichaelxie/doremi/blob/80c8b4d5840ca83c2121f5a4b48a0170e088e574/doremi/trainer.py#L232