Closed THBUer-yw closed 7 months ago
I wonder how the four reward metrics rewards/real, rewards/generated, rewards/accuracies and rewards/margins change during training? Can you show some pictures about them for reimplement?
An example from iter3. Behavior may vary in different settings. The connection of these metrics to the final evaluation is not obvious.
thanks a lot!
I wonder how the four reward metrics rewards/real, rewards/generated, rewards/accuracies and rewards/margins change during training? Can you show some pictures about them for reimplement?