huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Continued Pretraining on Llama7b. #78

Closed wiseyy closed 7 months ago

wiseyy commented 7 months ago

I want to do continued pretraining on my custom dataset, using the weights of Llama7b in the HF format. How do I initialize the model with those weights? I think there isn't a function for that yet.

xrsrke commented 7 months ago

Hey, you have to convert it to nanotron checkpoint format!!

Start by randomly initializing a llama model, then save the model checkpoint with dp=2, tp=2, pp=2, and you will see how Nanotron splits it. Then reformat the Hugging Face checkpoint in this way