hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Apache License 2.0
458 stars 28 forks source link

data of deita's dpo+sft #4

Closed jiezhangGt closed 8 months ago

jiezhangGt commented 8 months ago

May I ask where the preference data used in your dialogue model during the DPO process comes from? Is there an open-source plan for it? Thank you.