xfactlab / orpo

Official repository for ORPO
Apache License 2.0
412 stars 38 forks source link

no reference model? #23

Closed kxleee closed 4 months ago

kxleee commented 5 months ago

For the chat model after SFT, how to ensure that the model performance does not cause loss without a reference model? thanks

nlee-208 commented 5 months ago

Hello @kxleee, thanks for the question. Although is hard to strictly characterize the loss without a reference model in our setting, but you could imagine the SFT training on the chosen responses of ORPO acting as a guidance similar to what a reference model would explicitly give.