hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Apache License 2.0
458 stars 28 forks source link

reproduce mt-bench score #23

Closed bpucla closed 3 days ago

bpucla commented 5 months ago

Dear Authors,

Thank you for you great work! I'm trying to reproduce the reported MT-Bench scores with the released code and data.

Trying to reproduce: DEITA-7B-v1.0 (6K) --> mt-bench: 7.22 DEITA-7B-v1.0-sft --> mt-bench: 7.32

Data I used: hkust-nlp/deita-6k-v0 hkust-nlp/deita-10k-v0

Code I used: https://github.com/hkust-nlp/deita/blob/main/examples/train/sft.sh

The scores for both 6k and 10k I got are around 7.06 (vs. 7.22, 7.32). The difference seems larger than regular SFT and MT-Bench eval variability.

Any suggestions to resolve the discrepancy would be appreciated.

Thanks!

VPeterV commented 5 months ago

Hi,

Thank you for your interest! We have indeed noticed some fluctuations during the model training process. One potential solution we recommend is to replicate our development environment by installing the dependencies listed in our requirements.txt file and training the model again.

Furthermore, a key benefit of this data-efficient instruction tuning approach is its viability in re-training models to identify the most optimal one.

If you have any other problems please feel free to contact us