huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
8.61k stars 1.06k forks source link

What is the difference between PPOv2Trainer and PPOTrainer? #1763

Closed mst272 closed 6 days ago

mst272 commented 1 week ago

What is the difference between PPOv2Trainer and PPOTrainer? And in trl\examples\scripts\ppo\ppo.py and trl\examples\scripts\ppo.py , there are two dpo.py files, can you tell me what is different between them?

vwxyzjn commented 6 days ago

Hi, @mst272. The PPOv2Trainer is the new experimental PPO trainer we now recommend to the users. It's a refactor of PPOTrainer and PPOv2Trainer introduces more uniform APIs, better logging, documentations, and more benchmark results.