the four reward metrics

uclaml / SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

https://uclaml.github.io/SPIN/

Apache License 2.0

1.05k stars 92 forks source link

the four reward metrics #16

Closed THBUer-yw closed 7 months ago

THBUer-yw commented 9 months ago

I wonder how the four reward metrics rewards/real, rewards/generated, rewards/accuracies and rewards/margins change during training? Can you show some pictures about them for reimplement？

angelahzyuan commented 9 months ago

00685576-926e-473d-ae4a-26454b7c7ba6

An example from iter3. Behavior may vary in different settings. The connection of these metrics to the final evaluation is not obvious.

THBUer-yw commented 9 months ago

thanks a lot!