Closed wiseyy closed 7 months ago
Hey, you have to convert it to nanotron checkpoint format!!
Start by randomly initializing a llama model, then save the model checkpoint with dp=2, tp=2, pp=2, and you will see how Nanotron splits it. Then reformat the Hugging Face checkpoint in this way
I want to do continued pretraining on my custom dataset, using the weights of Llama7b in the HF format. How do I initialize the model with those weights? I think there isn't a function for that yet.