How do you choose the value of batch-size?

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Apache License 2.0

7.64k stars 446 forks source link

How do you choose the value of batch-size? #64

Closed PeiqinSun closed 10 months ago

PeiqinSun commented 11 months ago

Thank for you nice work! I calculate the batch-size use the equation from scaling-OpenAI, which is 12M tokens if I want achieve a loss ~ 1.8. But I found all paper(include: LLaMA, OPT, GPT, and TinyLLaMA) only 4M or 2M tokens. So, I want to ask you why choose the batch-size is 1024 * 2048(2M) tokens?

$$B{crit}(L) = \frac{B*}{L^{1 / \alpha{B}}}$$

Looking forward to hearing from you in your free time. Thank you very much.

jzhang38 commented 10 months ago

But I found all paper(include: LLaMA, OPT, GPT, and TinyLLaMA) only 4M or 2M tokens. So, I want to ask you why choose the batch-size is 1024 * 2048(2M) tokens?

Yeah we choose 2M as the batch size following those previous papers as a heuristics.