jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.71k stars 453 forks source link

Why does a dimension mismatch occur when I use AutoModelForCausalLM to load a model? #25

Closed BaenRH closed 1 year ago

BaenRH commented 1 year ago
model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path)

File "/usr/local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2795, in from_pretrained ) = cls._load_pretrained_model( File "/usr/local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3173, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]). size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]). size mismatch for model.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]). size mismatch for model.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]). size mismatch for model.layers.2.self_attn.k_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]). size mismatch for model.layers.2.self_attn.v_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048]).

VatsaDev commented 1 year ago

I'm not sure if this model is meant to be used with huggingface yet, even the hosted inference API freezes up There is a Colab at #6

jzhang38 commented 1 year ago

You need to upgrade huggingface to newest version.