wzhouad / WPO

Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"
Other
21 stars 0 forks source link