[MiniLLM]LLama sft on Dolly hard to reproduce results in paper.

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

https://aka.ms/GeneralAI

MIT License

3.6k stars 274 forks source link

[MiniLLM]LLama sft on Dolly hard to reproduce results in paper. #163

Closed yumath closed 7 months ago

yumath commented 8 months ago

In your minillm paper Table 1, LLaMa-7B SFT w/o KD R-L results below:		DollyEval	SelfInst	VicunaEval	S-NI	UnNI
Reported in paper	26.3	20.8	17.5	32.4	35.8
Reproduced	25.4	16.9	18.4	28.6	31.0

But I hard to reproduce this student. So, does any key components of sft on LLaMa I have missed? I just use scripts of scripts/llama/sft/sft_7B.sh on my 1 node 8*32G V100, after using your data process scripts to get Dolly full dataset.

t1101675 commented 7 months ago

We train 7B models with 16 32G V100. Using only 8 32G V100 may mean the batch size is reduced by half.

yumath commented 7 months ago

@t1101675 Thanks very much, maybe I need to training 2x epoches.