jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.7k stars 453 forks source link

[Question] Is pre-training with FP32 possible? #145

Closed veritas9872 closed 7 months ago

veritas9872 commented 8 months ago

Hello, I am currently trying out pre-training and I was curious if the data type used for pre-training could be configured from using torch.bfloat16 to torch.float32. I would like to try this out in some unstable phases of model pre-training where the extra precision might be useful. I am aware that TF32 has been enabled in the repository, so I think that the training will not become too slow. However, I was not able to find a configurable option for using FP32. Is there a location where this can be done? If not, do I have to manually change the code manually to use FP32? Many thanks in advance!

ChaosCodes commented 8 months ago

Hi, you can check this to enable the FP32 training.

veritas9872 commented 8 months ago

Thank you for the help! Unfortunately, I have found that this still requires manual editing of the model because parts such as the rotary embedding expect BF16.