data of deita's dpo+sft

hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]

Apache License 2.0

502 stars 28 forks source link

Closed jiezhangGt closed 10 months ago

jiezhangGt commented 10 months ago

May I ask where the preference data used in your dialogue model during the DPO process comes from? Is there an open-source plan for it? Thank you.