Struggle with training LLaMA with a single GPU using both PT v1 and v2

linhduongtuan commented 1 year ago

Hi, I love your code base and want to try how to train the LLaMA with a single GPU. This code I use is here https://github.com/juncongmoo/pyllama/blob/main/llama/model_single.py. However, I struggle with an error. This message's shown that: " self.tok_embeddings = nn.Embedding(params.vocab_size, params.dim) File "/home/linh/anaconda3/envs/a/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 139, in init self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs)) RuntimeError: Trying to create tensor with negative dimension -1: [-1, 512] " Can you help me to fix/test this code again.

Thank in advance. Linh

mldevorg commented 1 year ago

Guess your torch version is too old?

linhduongtuan commented 1 year ago

No. I test both PT V1 and V2 (updated very recent)

juncongmoo commented 1 year ago

@linhduongtuan Can you please post your environment info like OS, torch version, Model files checksum? (I cannot reproduce your issue.)

linhduongtuan commented 1 year ago

@juncongmoo, I use PT v2 nightly (or PT 1.13) in Os Ubuntu 20.4, CUDA 11.7, LLaMA 7B. Instead of loading the model checkpoint, I want to train the model from scratch.

juncongmoo / pyllama

Struggle with training LLaMA with a single GPU using both PT v1 and v2 #14