reproduce mt-bench score

hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]

Apache License 2.0

458 stars 28 forks source link

Dear Authors,

Thank you for you great work! I'm trying to reproduce the reported MT-Bench scores with the released code and data.

Trying to reproduce: DEITA-7B-v1.0 (6K) --> mt-bench: 7.22 DEITA-7B-v1.0-sft --> mt-bench: 7.32

Data I used: hkust-nlp/deita-6k-v0 hkust-nlp/deita-10k-v0

Code I used: https://github.com/hkust-nlp/deita/blob/main/examples/train/sft.sh

The scores for both 6k and 10k I got are around 7.06 (vs. 7.22, 7.32). The difference seems larger than regular SFT and MT-Bench eval variability.

Any suggestions to resolve the discrepancy would be appreciated.

Thanks!

hkust-nlp / deita

reproduce mt-bench score #23