microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.6k stars 274 forks source link

[MiniLLM]LLama sft on Dolly hard to reproduce results in paper. #163

Closed yumath closed 7 months ago

yumath commented 8 months ago
In your minillm paper Table 1, LLaMa-7B SFT w/o KD R-L results below: DollyEval SelfInst VicunaEval S-NI UnNI
Reported in paper 26.3 20.8 17.5 32.4 35.8
Reproduced 25.4 16.9 18.4 28.6 31.0

But I hard to reproduce this student. So, does any key components of sft on LLaMa I have missed? I just use scripts of scripts/llama/sft/sft_7B.sh on my 1 node 8*32G V100, after using your data process scripts to get Dolly full dataset.

t1101675 commented 7 months ago

We train 7B models with 16 32G V100. Using only 8 32G V100 may mean the batch size is reduced by half.

yumath commented 7 months ago

@t1101675 Thanks very much, maybe I need to training 2x epoches.