Closed mst272 closed 6 days ago
Hi, @mst272. The PPOv2Trainer
is the new experimental PPO trainer we now recommend to the users. It's a refactor of PPOTrainer
and PPOv2Trainer
introduces more uniform APIs, better logging, documentations, and more benchmark results.
What is the difference between PPOv2Trainer and PPOTrainer? And in trl\examples\scripts\ppo\ppo.py and trl\examples\scripts\ppo.py , there are two dpo.py files, can you tell me what is different between them?