young-geng / EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
Apache License 2.0
2.33k stars 247 forks source link

Llama 7b Pretraining Dtype #110

Open LeoXinhaoLee opened 4 months ago

LeoXinhaoLee commented 4 months ago

Hi, thank you so much for releasing this wonderful code!

I notice in your examples/pretrain_llama_7b.sh, the dtype is set to fp32, which seems to make activations fp32. However, I think it's more common to make activations bf16? Also, I notice that it seems like the param_dtype is always set to fp32.

Could you please elaborate a bit on this choice? Thank you very much!