Confused about iterations

uclaml / SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

https://uclaml.github.io/SPIN/

Apache License 2.0

1.05k stars 92 forks source link

Confused about iterations #25

Open junkangwu opened 8 months ago

junkangwu commented 8 months ago

Hi there, great job on the project!

I'm looking to clarify whether the UCLA-AGI/zephyr-7b-sft-full-SPIN-iter1 model was fine-tuned on top of UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 or alignment-handbook/zephyr-7b-sft-full. The paper suggests that training progresses from $\thetat$ to $\theta{t+1}$. However, the description provided at https://huggingface.co/UCLA-AGI/zephyr-7b-sft-full-SPIN-iter1 seems to indicate otherwise.

I would appreciate any clarification on this matter.

Thank you!

angelahzyuan commented 7 months ago

Hi there, great job on the project!

I'm looking to clarify whether the UCLA-AGI/zephyr-7b-sft-full-SPIN-iter1 model was fine-tuned on top of UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 or alignment-handbook/zephyr-7b-sft-full. The paper suggests that training progresses from θt to θt+1. However, the description provided at https://huggingface.co/UCLA-AGI/zephyr-7b-sft-full-SPIN-iter1 seems to indicate otherwise.

I would appreciate any clarification on this matter.

Thank you!

Hi, thank you!

We haver released the full training pipeline in our repo, "Reproducing our results" section. Iter1 is trained with iter0 as the base model. The description provided at https://huggingface.co/UCLA-AGI/zephyr-7b-sft-full-SPIN-iter1 is meant to indicate that it is fine-tuned by starting from zephyr-7b-sft-full and running 2 iterations.

Feel free to let us know if there are any other questions

junkangwu commented 7 months ago

@angelahzyuan Your configuration for the training epochs appears to be somewhat disorganized. Within your configs/config_iter1.yaml file, I observe the num_train_epochs parameter is set to 3, yet in conversation you mentioned it to be 2. Moreover, in the scripts/finetune_iter1.sh script, the parameter num_train_epochs is explicitly set to 6. There seems to be an inconsistency that warrants clarification.

For the sake of ensuring precise experimental protocols, it is imperative that we align the epoch configuration across all interfaces and documentation. Would you kindly specify which is the definitive setting? This concise and coherent alignment is essential for the reproducibility of the experiment and for any ensuing analysis to be founded on a consistent experimental setup.

angelahzyuan commented 7 months ago

Hi @junkangwu , thanks for your follow-up question. In all iterations, the num_train_epochs parameter is set to 6. This setting is enforced explicitly in the training script. For instance, in scripts/finetune_iter2.sh, you'll find that num_train_epochs is explicitly set to 6 as well. Additionally, I want to clarify that in our previous conversation, when I mentioned "iter1 starts from zephyr-7b-sft-full and running 2 iterations," I was referring to iterations, not epochs. Finally, in addition to num_train_epochs, which influences the learning rate schedule, the checkpoint selected to proceed to the next iteration is determined by model selection. Based on our experience, we've found that 2 epochs is typically a safe choice for this. We'll update the configuration file to make it more clear

junkangwu commented 7 months ago

@angelahzyuan Thank you for your detailed response. So, to ensure I understand correctly, num_train_epochs is set to 6, but the checkpoint selected to proceed to the next iteration is at epoch 2. Is my understanding accurate? If so, why not set num_train_epochs directly to 2 instead of 6? I have two interpretations, and I would appreciate it if you could clarify where my understanding may be incorrect:

num_train_epochs is set to 6, but the model used for generating negative samples for the next iteration is the one from the checkpoint at epoch=2.
num_train_epochs is set to 6, but the model used for initializing the reference for the next iteration is the one from the checkpoint at epoch=2.

Could you please clarify which interpretation, if any, is correct, or point out where I might be misunderstanding?