What precision strategy is used in pre-training OpenLlama?

young-geng / EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Apache License 2.0

2.38k stars 254 forks source link

Closed haozhouamzn closed 1 year ago

haozhouamzn commented 1 year ago

Hello.

I am wondering what kind of precision strategy is applied during the pre-training of OpemLlama?

Is it fp32, fp16, bf16 or mixed precision?

Thank you in advance

young-geng commented 1 year ago

We used dtype='fp32', since for small models using fp16 does not give much speed improvement

haozhouamzn commented 1 year ago

Thanks Young. Just to confirm, it's using fp32 for all three models (3B, 7B and 13B)?

young-geng commented 1 year ago

It was fp32 for all of them