Hi, thank you so much for releasing this wonderful code!
I notice in your examples/pretrain_llama_7b.sh, the dtype is set to fp32, which seems to make activations fp32. However, I think it's more common to make activations bf16? Also, I notice that it seems like the param_dtype is always set to fp32.
Could you please elaborate a bit on this choice? Thank you very much!
Hi, thank you so much for releasing this wonderful code!
I notice in your
examples/pretrain_llama_7b.sh
, thedtype
is set tofp32
, which seems to make activationsfp32
. However, I think it's more common to make activationsbf16
? Also, I notice that it seems like the param_dtype is always set tofp32
.Could you please elaborate a bit on this choice? Thank you very much!