I have encountered what appears to be a bug in the calculation of the loss function within the dataloader.py file, specifically between lines 167 and 174. This issue arises while using the dpo method.
pi_logratios = idk_loss_current - forget_loss_current
ref_logratios = idk_loss_oracle - forget_loss_oracle
beta = 0.1
loss = -F.logsigmoid(beta * (pi_logratios - ref_logratios)).mean()
print(loss.item())
loss = -pi_logratios.mean()
loss = -idk_loss_current.mean()
It appears that the final calculation of loss as -idk_loss_current.mean() contradicts the expected retult described in the paper.
Thank you for your attention to this matter.
I have encountered what appears to be a bug in the calculation of the loss function within the dataloader.py file, specifically between lines 167 and 174. This issue arises while using the dpo method.
It appears that the final calculation of
loss
as-idk_loss_current.mean()
contradicts the expected retult described in the paper. Thank you for your attention to this matter.