When I run the script run.py to get the weight gradients of llama-2-70b, I encounter the following when loading the model.
size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).
When I run the script
run.py
to get the weight gradients of llama-2-70b, I encounter the following when loading the model.could I know how could this possibly be fixed?