sangmichaelxie / doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
https://arxiv.org/abs/2305.10429
MIT License
286 stars 32 forks source link

question about domain weights initialization value in paper fingure 8 #7

Closed Haijunlv closed 1 year ago

Haijunlv commented 1 year ago

Thx for the amazing paper and open codebase! I have one question about paper fingure 8. The domain weights initialization value in figure 8 (step 0) seems not equal as the Algorithm 1 Initialize domain weights α0 = 1/k. And domain weights initialization in figure 8a is also not equal with figure 8b. So what is the domain weights initialization strategy in figure 8?

image image

sangmichaelxie commented 1 year ago

The initialization is uniform, it's just that the plot doesn't start exactly at step 0 (I think it starts at 100), and the plot is an exponential moving average over time steps.