pytorch / torchtune

A Native-PyTorch Library for LLM Fine-tuning
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
3.88k stars 346 forks source link

simPO #1037

Closed nivibilla closed 3 weeks ago

nivibilla commented 3 months ago

https://arxiv.org/html/2405.14734v1

Claims to have better performance than all previous offline rl training methods.

nivibilla commented 3 months ago

PR : #1036

pbontrager commented 3 months ago

Thank you for sharing this! I need to dig into the paper a bit more, could you highlight the differences between the SimPO recipes and the DPO recipes?

nivibilla commented 3 months ago

Simpo loss doesn't seem to have a need for the reference log probs(similar to ORPO) and introduces a new hyperparameter gamma. Which I guess also means the model being aligned doesn't need to be SFT on the training set before the RL part.

This is a really good summary of it https://x.com/_philschmid/status/1794627683575316548?t=m-qwqK8Mm1tlIO-pyMpIPw&s=19

pbontrager commented 3 months ago

If the only difference between SimPO and DPO is the loss, maybe we could define the recipes.

nivibilla commented 3 months ago

From what I understand from their repo the loss seems to be the only difference but I haven't verified by training identical models yet. But yeah, not sure if there needs to be two recipes for something so similar. I was thinking to refactor it to "preference optimisation" recipe and have dpo and simpo as loss options.

nivibilla commented 3 months ago

Or alternatively, not touch anything and simply have some internal logic to adjust the loss given the loss in the config yaml.