Closed pengzhenghao closed 2 months ago
Hi!
The problem with just using standard preference loss is that it is convex but not strictly convex -- and many of the possible optima can place a high weight on OOD actions. We have a long explanation of this in the paper. Contrastive bias < 1 is one way of making it so that lower loss solutions place a higher likelihood on in distribution data. We prove this in the paper. Another way is adding a BC penalty (CPL-BC).
Hope this helps!
Closing due to inactivity.
Hi, if I understand correctly
contrastive_bias
should always be 1? What's the justification forcontrastive_bias=0.5
andcontrastive_bias=0.75
? Thanks!