Closed marcospiau closed 1 year ago
We haven't tried continuing the pre-training yet, so I'm not so sure which hyperparameter setting would work the best. However, we are indeed planning to do it after we reach 1T tokens, and I'll update you about the configurations we use.
I think math data can be generate in every single calculation between -1000 ~ 1000, and random calculations in larger scale.
No clue how that model can handle math after that...
Thanks, @young-geng !
An update: we've decided to continue training the model on a mixture of natural language and code. We are using the same learning rate schedule as the previous training run.
Perfect @young-geng , thanks!!
@young-geng do you plan to release any documentation regarding the training process?
@ahmedrshdy Our pretraining configuration is basically this: https://github.com/young-geng/EasyLM/blob/main/examples/pretrain_llama_7b.sh
Hi,
First of all, thank you for sharing your work!
Do you have any recommendations on optimizer and learning rate scheduling configurations for continuing the causal language model to another language? I'm planning on continuing the language modeling pretraining on Portuguese data.
If there is already documentation covering this, please send it to me, and I will look into it.
Best, Marcos