jhejna / cpl

Code for Contrastive Preference Learning (CPL)
MIT License
147 stars 12 forks source link

The purpose of `contrastive_bias` #12

Closed pengzhenghao closed 2 months ago

pengzhenghao commented 5 months ago

Hi, if I understand correctly contrastive_bias should always be 1? What's the justification for contrastive_bias=0.5 and contrastive_bias=0.75? Thanks!

jhejna commented 5 months ago

Hi!

The problem with just using standard preference loss is that it is convex but not strictly convex -- and many of the possible optima can place a high weight on OOD actions. We have a long explanation of this in the paper. Contrastive bias < 1 is one way of making it so that lower loss solutions place a higher likelihood on in distribution data. We prove this in the paper. Another way is adding a BC penalty (CPL-BC).

Hope this helps!

jhejna commented 2 months ago

Closing due to inactivity.