Closed Esmail-ibraheem closed 4 weeks ago
Adding the proximal policy optimization (ppo) trainer
Applying the ppo trainer, so we can compare between the two trainers: ppo and dpo
No response
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 20 days since being marked as stale.
Feature Request
Adding the proximal policy optimization (ppo) trainer
Motivation
Applying the ppo trainer, so we can compare between the two trainers: ppo and dpo
Additional Context
No response