Closed bpucla closed 2 months ago
Hi,
Thank you for your interest! We have indeed noticed some fluctuations during the model training process. One potential solution we recommend is to replicate our development environment by installing the dependencies listed in our requirements.txt
file and training the model again.
Furthermore, a key benefit of this data-efficient instruction tuning approach is its viability in re-training models to identify the most optimal one.
If you have any other problems please feel free to contact us
Dear Authors,
Thank you for you great work! I'm trying to reproduce the reported MT-Bench scores with the released code and data.
Trying to reproduce: DEITA-7B-v1.0 (6K) --> mt-bench:
7.22
DEITA-7B-v1.0-sft --> mt-bench:7.32
Data I used: hkust-nlp/deita-6k-v0 hkust-nlp/deita-10k-v0
Code I used: https://github.com/hkust-nlp/deita/blob/main/examples/train/sft.sh
The scores for both 6k and 10k I got are around
7.06
(vs.7.22
,7.32
). The difference seems larger than regular SFT and MT-Bench eval variability.Any suggestions to resolve the discrepancy would be appreciated.
Thanks!