hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Apache License 2.0
502 stars 28 forks source link

Weighted Average for Score Intuition #32

Open harshitadd opened 2 months ago

harshitadd commented 2 months ago

Hello! Thanks for the great work. Can you please clarify the intuition behind using the weighted sum of probabilities as the score for the sample ? I can see from the discussion that this might be motivated by ensuring more stable training perhaps, but more details would be very helpful. Thanks!