uclaml / SPPO

The official implementation of Self-Play Preference Optimization (SPPO)
https://uclaml.github.io/SPPO/
Apache License 2.0
477 stars 61 forks source link