Open movingsheep opened 1 year ago
We import our ViT model from package timm
, and this is how they store their weight tensor. Indeed, W_Q, W_K and W_V should be [2304, 768], but timm
fuses the weight of W_Q, W_K and W_V into a single tensor, so that they can perform 3 linear transformations with 1 linear layer. You can check out their code for more information.
从 Windows 版邮件发送
发件人: movingsheep 发送时间: 2023年4月4日 20:27 收件人: hahnyuan/PTQ4ViT 抄送: Subscribed 主题: [hahnyuan/PTQ4ViT] Shape of saved quantized model parameter (Issue#13)
Hi, thanks for sharing the work! I meet a problem when trying to load the "vit_base_patch16_224.pth". The shape of 'blocks.0.attn.qkv' in pth file is torch.Size([3, 1, 2304, 768]). However, the shape of 'blocks.0.attn.qkv.weight' in model should be torch.Size([2304, 768]). What does the first and second dimension in torch.Size([3, 1, 2304, 768]) mean? I think it should be torch.Size([2304, 768]). — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi, thanks for sharing the work!
I meet a problem when trying to load the "vit_base_patch16_224.pth". The shape of 'blocks.0.attn.qkv' in pth file is torch.Size([3, 1, 2304, 768]). However, the shape of 'blocks.0.attn.qkv.weight' in model should be torch.Size([2304, 768]). What does the first and second dimension in torch.Size([3, 1, 2304, 768]) mean? I think it should be torch.Size([2304, 768]).