Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
MIT License
20.82k
stars
3.07k
forks
source link
The Total params: and Params size (MB) of the model printed by summary are different from the bit_base model in timm library. Theoretically, the same settings should be the same. What is the reason? #329
import torch from vit import ViT from torchsummary import summary import timm v = ViT( image_size = 224, patch_size = 16, num_classes = 1000, dim = 768, depth = 12, heads = 12, mlp_dim = 3072, dropout = 0.1, emb_dropout = 0.1 )
使用 summary 显示模型的摘要
summary(v, input_size=(3, 224, 224), device='cpu') # 传入输入的形状 (C, H, W)
加载 ViT-B/16 模型
model = timm.create_model('vit_base_patch16_224', pretrained=False)
打印模型的摘要信息
summary(model, input_size=( 3, 224, 224))