kssteven418 / SqueezeLLM-gradients

Apache License 2.0
12 stars 7 forks source link

Size mismatch for model layers when trying to compute gradient for llama-2-70b #5

Open tjtanaa opened 8 months ago

tjtanaa commented 8 months ago

When I run the script run.py to get the weight gradients of llama-2-70b, I encounter the following when loading the model.

size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).

could I know how could this possibly be fixed?