microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications
MIT License
354 stars 31 forks source link

error when Fine-tuning a sliced model llama 3 #155

Closed ChrisXULC closed 3 months ago

ChrisXULC commented 3 months ago

RuntimeError: Error(s) in loading state_dict for UninitializedLlamaForCausalLM: size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([128256, 2864]) from checkpoint, the shape in current model is torch.Size([128256, 3072]). size mismatch for model.layers.0.mlp_shortcut_Q: copying a param with shape torch.Size([2864, 2864]) from checkpoint, the shape in current model is torch.Size([3072, 3072]). size mismatch for model.layers.0.attn_shortcut_Q: copying a param with shape torch.Size([2864, 2864]) from checkpoint, the shape in current model is torch.Size([3072, 3072]). size mismatch for model.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([4096, 2864]) from checkpoint, the shape in current model is torch.Size([4096, 3072]). size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 2864]) from checkpoint, the shape in current model is torch.Size([1024, 3072]). size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torc