Optimization strategy for continued training

openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

Apache License 2.0

7.29k stars 372 forks source link

Optimization strategy for continued training #33

Closed marcospiau closed 1 year ago

marcospiau commented 1 year ago

Hi,

First of all, thank you for sharing your work!

Do you have any recommendations on optimizer and learning rate scheduling configurations for continuing the causal language model to another language? I'm planning on continuing the language modeling pretraining on Portuguese data.

If there is already documentation covering this, please send it to me, and I will look into it.

Best, Marcos

young-geng commented 1 year ago

We haven't tried continuing the pre-training yet, so I'm not so sure which hyperparameter setting would work the best. However, we are indeed planning to do it after we reach 1T tokens, and I'll update you about the configurations we use.

FNsi commented 1 year ago

I think math data can be generate in every single calculation between -1000 ~ 1000, and random calculations in larger scale.

No clue how that model can handle math after that...

marcospiau commented 1 year ago

Thanks, @young-geng !

young-geng commented 1 year ago

An update: we've decided to continue training the model on a mixture of natural language and code. We are using the same learning rate schedule as the previous training run.

marcospiau commented 1 year ago

Perfect @young-geng , thanks!!

AhmedBytesBits commented 1 year ago

@young-geng do you plan to release any documentation regarding the training process?

young-geng commented 1 year ago

@ahmedrshdy Our pretraining configuration is basically this: https://github.com/young-geng/EasyLM/blob/main/examples/pretrain_llama_7b.sh