Closed lewtun closed 9 months ago
Hi, thank you very much for your interests! Yes, your understanding is correct. zephyr-7b-sft-full-SPIN-iter0 is trained on UCLA-AGI/SPIN_iter0 (50k samples). The following iteration 1 is trained on data from iteration 0 and 1 (100k samples), and iteration 2 is similarly trained with data from iteration 1 and 2 (100k samples).
We always run generations on the same 50k prompts at each iteration. Since the 100k prompts would simply contain two duplicates for each prompt, we just leverage the same 50k prompt.
I hope the above clarifies the questions!
Thank you @yihedeng9 this is very helpful!
We always run generations on the same 50k prompts at each iteration. Since the 100k prompts would simply contain two duplicates for each prompt, we just leverage the same 50k prompt.
For my understanding, is the reason for this to ensure the model at iteration t
doesn't drift too far away from the model at iteration t-1
?
Yes, we observed in experiments that incorporating data from previous iterations helps in stabilizing the model performance in larger iterations. We consider it as a form of regularization, ensuring the model doesn't significantly deviate from its performance in the previous iteration as you said.
Thank you! Closing the issue since everything is now clear :)
Hello, thank you for open sourcing the code behind SPIN - it's very clean!
I'm currently working on porting this to
trl
in https://github.com/huggingface/trl/pull/1344 and am validating everything works on a small Qwen-1.5-0.5b model.On p.9 of your paper, you state that you combine datasets across each iteration:
My question concerns which combination of datasets you used for each SPIN iteration:
zephyr-7b-sft-full-SPIN-iter0
trained onUCLA-AGI/SPIN_iter0
(50k samples)?zephyr-7b-sft-full-SPIN-iter1
trained onUCLA-AGI/SPIN_iter0
andUCLA-AGI/SPIN_iter1
(100k samples)?zephyr-7b-sft-full-SPIN-iter2
trained onUCLA-AGI/SPIN_iter1
andUCLA-AGI/SPIN_iter2
(100k samples)?In other words, do you combine the generations from the model trained on iteration
t
with those fromt-1
?A related question is whether you always run generation on the same 50k prompts at each iteration or do you generate over 100k prompts for iterations 1-3?
Thanks!