jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
650 stars 47 forks source link

can training codellama? #29

Closed 5taku closed 6 months ago

5taku commented 6 months ago

Thank you very much for your code.

I performed train.py with the codellama 34b base.

This training went well and I confirmed that a checkpoint output of 76G, which is the same as codellama 34b, was generated. Afterwards, when trying to load the generated model through LlamaForCausalLM, the following error occurred.

ValueError: Trying to set a tensor of shape torch.size([0]) in "weight" (which has shape torch.Size([32000, 8192])), this look incorrect.

Is there anything I missed or need to fix?

jzhang38 commented 6 months ago

You need to remove the model.safetensors like this:

https://github.com/jzhang38/EasyContext/blob/3c68bd5602e3f37582f9bbe73ab083273bd4a1c7/train_scripts/EasyContext-1M-Llama-2-7B.sh#L22

I don't why this would happen.

5taku commented 6 months ago

You need to remove the model.safetensors like this:

https://github.com/jzhang38/EasyContext/blob/3c68bd5602e3f37582f9bbe73ab083273bd4a1c7/train_scripts/EasyContext-1M-Llama-2-7B.sh#L22

I don't why this would happen.

solved!! thank you!!! :)