uclaml / SPPO

The official implementation of Self-Play Preference Optimization (SPPO)
https://uclaml.github.io/SPPO/
Apache License 2.0
498 stars 62 forks source link

What's the package configuration for reproduce SPPO-Gemma-2? #14

Open Jackory opened 4 months ago

Jackory commented 4 months ago

I found that the current repository configuration is not compatible with Gemma2. The reason might be that transformers and vllm are not fully compatible with Gemma2. Could you share the package configurations to reproduce SPPO-Gemma-2?

angelahzyuan commented 4 months ago

@Jackory At the time of our training, the package configurations are in #5 . Later in #12 , it was suggested that the latest version of vllm stops working.