uclaml / SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)
https://uclaml.github.io/SPIN/
Apache License 2.0
1.05k stars 92 forks source link

the four reward metrics #16

Closed THBUer-yw closed 7 months ago

THBUer-yw commented 9 months ago

I wonder how the four reward metrics rewards/real, rewards/generated, rewards/accuracies and rewards/margins change during training? Can you show some pictures about them for reimplement?

angelahzyuan commented 9 months ago

00685576-926e-473d-ae4a-26454b7c7ba6

An example from iter3. Behavior may vary in different settings. The connection of these metrics to the final evaluation is not obvious.

THBUer-yw commented 9 months ago

thanks a lot!