Closed PeiqinSun closed 10 months ago
But I found all paper(include: LLaMA, OPT, GPT, and TinyLLaMA) only 4M or 2M tokens. So, I want to ask you why choose the batch-size is 1024 * 2048(2M) tokens?
Yeah we choose 2M as the batch size following those previous papers as a heuristics.
Thank for you nice work! I calculate the batch-size use the equation from scaling-OpenAI, which is 12M tokens if I want achieve a loss ~ 1.8. But I found all paper(include: LLaMA, OPT, GPT, and TinyLLaMA) only 4M or 2M tokens. So, I want to ask you why choose the batch-size is 1024 * 2048(2M) tokens?
$$B{crit}(L) = \frac{B*}{L^{1 / \alpha{B}}}$$
Looking forward to hearing from you in your free time. Thank you very much.