princeton-nlp / SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
MIT License
687 stars 44 forks source link

About label_smoothing #36

Closed mazhengyufreedom closed 3 months ago

mazhengyufreedom commented 3 months ago

In the loss function: losses = -F.logsigmoid(self.beta logits) (1 - self.label_smoothing) - F.logsigmoid(-self.beta logits) self.label_smoothing I found that the default value of self.label_smoothing is 0, if that's the case, the losses=losses = -F.logsigmoid(self.beta logits) (1 - self.label_smoothing), am I right? Is there any other value of self.label_smoothing can be used?

xiamengzhou commented 3 months ago

Hi @mazhengyufreedom! We've only experimented with self.label_smoothing = 0, resulting in the vanilla SimPO loss. We're unsure how label smoothing will affect the results, as it hasn't been widely adopted in this setting.

mazhengyufreedom commented 3 months ago

Hi @mazhengyufreedom! We've only experimented with self.label_smoothing = 0, resulting in the vanilla SimPO loss. We're unsure how label smoothing will affect the results, as it hasn't been widely adopted in this setting.

ok, got it, thanks!