huggingface / huggingface-llama-recipes

551 stars 60 forks source link

DPO Fine-Tuning #73

Open AnirudhJM24 opened 2 months ago

AnirudhJM24 commented 2 months ago

The repository contains examples to fine-tune the model using Supervised Fine Tuning. I wish to add examples of Transformer Reinforcement Learning (TRL) particulary Direct Policy Optimization (DPO)

ariG23498 commented 1 month ago

Hey @AnirudhJM24

I really like the idea, but would also ask you to share a rough colab notebook for this. I don't want a very complicated setup for SFT in the repository. Having said that, if you can showcase the workflow in a very simple way, I would be open to adding it.

Also do take a look at the /fine_tune directory for inspiration.