zkkli / I-ViT

[ICCV 2023] I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Apache License 2.0
140 stars 9 forks source link

Mismatch error when trying to load trained model #7

Open fabriziojpiva opened 3 months ago

fabriziojpiva commented 3 months ago

Hi @zkkli, many thanks for your work, it is a quite nice contribution to the state-of-the-art.

After training a model a model by running: python quant_train.py --model deit_tiny --data <YOUR_DATA_DIR> --epochs 30 --lr 5e-7, the checkpoint is saved in results/checkpoint.pth.tar. When trying to load the model and weights on it, I get the following error:

RuntimeError: Error(s) in loading state_dict for VisionTransformer: size mismatch for qact_input.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for patch_embed.qact.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for qact_pos.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in cu rrent model is torch.Size([1]). size mismatch for blocks.0.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.0.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.0.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.0.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.0.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.0.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.0.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.0.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.0.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.0.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.0.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.0.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.0.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.0.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.1.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.1.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.1.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.1.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.1.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.1.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.1.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.1.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.1.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.1.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.1.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.1.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.1.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.1.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.2.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.2.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.2.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.2.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.2.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.2.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.2.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.2.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.2.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.2.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.2.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.2.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.2.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.2.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.3.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.3.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.3.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.3.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.3.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.3.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.3.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.3.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.3.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.3.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.3.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.3.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.3.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.3.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.4.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.4.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.4.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.4.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.4.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.4.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.4.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.4.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.4.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.4.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.4.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.4.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.4.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.4.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.5.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.5.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.5.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.5.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.5.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.5.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.5.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.5.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.5.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.5.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.5.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.5.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.5.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.5.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.6.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.6.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.6.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.6.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.6.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.6.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.6.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.6.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.6.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.6.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.6.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.6.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.6.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.6.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.7.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.7.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.7.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.7.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.7.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.7.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.7.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.7.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.7.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.7.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.7.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.7.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.7.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.7.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.8.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.8.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.8.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.8.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.8.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.8.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.8.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.8.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.8.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.8.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.8.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.8.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.8.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.8.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.9.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.9.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.9.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.9.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoi nt, the shape in current model is torch.Size([1]). size mismatch for blocks.9.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.9.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.9.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.9.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.9.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.9.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.9.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.9.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, th e shape in current model is torch.Size([1]). size mismatch for blocks.9.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint , the shape in current model is torch.Size([1]). size mismatch for blocks.9.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the sh ape in current model is torch.Size([1]). size mismatch for blocks.10.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.10.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s hape in current model is torch.Size([1]). size mismatch for blocks.10.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.10.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpo int, the shape in current model is torch.Size([1]). size mismatch for blocks.10.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.10.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.10.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoin t, the shape in current model is torch.Size([1]). size mismatch for blocks.10.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s hape in current model is torch.Size([1]). size mismatch for blocks.10.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.10.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s hape in current model is torch.Size([1]). size mismatch for blocks.10.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.10.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.10.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoin t, the shape in current model is torch.Size([1]). size mismatch for blocks.10.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s hape in current model is torch.Size([1]). size mismatch for blocks.11.norm1.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.11.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s hape in current model is torch.Size([1]). size mismatch for blocks.11.attn.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.11.attn.qact_attn1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpo int, the shape in current model is torch.Size([1]). size mismatch for blocks.11.attn.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.11.attn.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.11.attn.matmul_1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoin t, the shape in current model is torch.Size([1]). size mismatch for blocks.11.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s hape in current model is torch.Size([1]). size mismatch for blocks.11.norm2.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.11.qact3.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s hape in current model is torch.Size([1]). size mismatch for blocks.11.mlp.qact1.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.11.mlp.qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, t he shape in current model is torch.Size([1]). size mismatch for blocks.11.mlp.qact_gelu.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoin t, the shape in current model is torch.Size([1]). size mismatch for blocks.11.qact4.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the s hape in current model is torch.Size([1]). size mismatch for norm.norm_scaling_factor: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for qact2.act_scaling_factor: copying a param with shape torch.Size([]) from checkpoint, the shape in cu rrent model is torch.Size([1]).

Any thoughts on why this is happening?