LlaMa Pretraining in A100 80G

young-geng / EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Apache License 2.0

2.33k stars 247 forks source link

LlaMa Pretraining in A100 80G #79

Open mohammadaminabbasi opened 1 year ago

mohammadaminabbasi commented 1 year ago

I would like to request information regarding the pretraining configuration of LlaMa on the A100 80G GPU for my project. As I am planning to use this setup for my research, having access to the specific pretraining configuration details would greatly help me in replicating and benchmarking the results and best speed.

young-geng commented 1 year ago

As described in this issue, the configurations would be similar to the pretraining on TPU pods, with the additional jax distributed configurations. However, you'll have to tune the mesh shape and batch size yourself according to the configuration of your own cluster in order to obtain the best throughput. Unfortunately I don't have access to a few hundred A100s so I cannot provide a good example for that.