Open RandomInternetPreson opened 4 months ago
Thank you! We've trained a slightly larger model (UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3) which achieved an LC-win rate of 53.27, using the same parameters and scripts.
As long as your GPU has sufficient VRAM, the training script should perform well. We will keep you updated as we proceed to training larger models.
I'm running the Llama-3-Instruct-8B-SPPO-Iter3 model locally and am very impressed by the improved quality from the original model. I can't help but wonder what the results would be if this finetuning process were run on larger models.
Is it possible to run the code on these larger models, or are the smaller versions too different form their larger counterparts; requiring a rework of the training scripts?
Thank you for what you have contributed, this is great stuff!