Feature Request : ORPO - Githubissues

pytorch / torchtune

PyTorch native finetuning library

https://pytorch.org/torchtune/main/

BSD 3-Clause "New" or "Revised" License

4.17k stars 401 forks source link

Feature Request : ORPO #894

Closed nivibilla closed 4 months ago

nivibilla commented 5 months ago

Hi,

First of all, thank you for this library! Very clean and I appreciate that all I need is pytorch!

I wanted to make an issue for the integration of ORPO, not needing to do SFT before the RLHF step is huge since it saves a lot of compute when training on preference data. Hoping it can be integrated into torch tune (with Lora support if possible)!

There is an existing integration into TRL

Thanks!

kartikayk commented 5 months ago

@nivibilla thanks for opening this issue!

ORPO would indeed be a really nice addition to the library. It's not been at the top of our list if I was being honest, but maybe we should reconsider. Is this something you'd be open to adding? DPO and PPO (WIP) are both added by our awesome community members and if you'd be interested in adding ORPO - I'm happy to help to brainstorm and review the design, code etc.

nivibilla commented 5 months ago

Hi @kartikayk

Im not that experienced in writing custom training loops. Mainly a huggingface user haha. I'd be no better than llama 3 70b attempting it 🤣

kartikayk commented 5 months ago

I'm partly serious here, but why not train a codellama 70B using torchtune and then see if this gets you the right recipe :)

nivibilla commented 4 months ago

@kartikayk finally got round to doing this and saw a new paper called simPO (simple preference optimisation) and it indeed was simpler to implement than orpo. Only real change being the loss function. And the paper claims some impressive stuff, beating all other offline rl methods.

I have a draft pr here #1036, draft since it's quite messy and I feel it has a lot of duplicate code. Also I haven't tested it at all.

Closing this issue in favour of #1037