Size mismatch for model layers when trying to compute gradient for llama-2-70b

When I run the script run.py to get the weight gradients of llama-2-70b, I encounter the following when loading the model.

size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).

could I know how could this possibly be fixed?

kssteven418 / SqueezeLLM-gradients

Size mismatch for model layers when trying to compute gradient for llama-2-70b #5