uclaml / SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)
https://uclaml.github.io/SPIN/
Apache License 2.0
1.05k stars 92 forks source link

Cannot reproduce the result #36

Open Joyyang158 opened 4 months ago

Joyyang158 commented 4 months ago

Hi authors,

I tried to reproduce the result you declared in the paper using zephyr-7b-sft-full

If following the algorithm you design in the paper, only the performance of the first iteration can increase, and then it decreases after later a few iterations. And I set the epoch = 3.

Could you give me some guidance? Thanks!

angelahzyuan commented 4 months ago

@Joyyang158 To reproduce the results from our paper, you would need:

  1. Use zephyr-7b-sft-full at revision ac6e600eefcce74f5e8bae1035d4f66019e93190.
  2. Use HF generation, as VLLM generation differs from what we used in the paper.
  3. Set the total number of epochs to 6, and stop as needed.
  4. For evaluation, use lm-evaluation-harness at version v0.4.0.

For settings different from our original configuration, you may need to adjust the parameters. We are currently working on tuning the parameters for VLLM generation and will provide updates once we have results.

Thank you.

Joyyang158 commented 4 months ago

Thanks for your reply. I will try epoch = 6. And There are two things I want to check with you

  1. Every time, you use previous two iterations instead of one iteration as shown in the paper?
  2. Every time, the base model is sft model or the iterative model like the model-iter0,1,2,3?

Thanks!

angelahzyuan commented 4 months ago

@Joyyang158

  1. previous two iterations' data works better than single iteration.
  2. the base model is changed to iter0,1,2
Joyyang158 commented 4 months ago

I see, thanks for your help and patience!