lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
MIT License
20.37k stars 3.03k forks source link

T2TViT Performer backbone #79

Open hussam789 opened 3 years ago

hussam789 commented 3 years ago

Hey, Can I simply use performer (or other efficient transformers) in T2TViT similar to the original paper?

lucidrains commented 3 years ago

@hussam789 sounds good! in 0.7.6 you can do

import torch
from vit_pytorch.t2t import T2TViT
from performer_pytorch import Performer

performer = Performer(
    dim = 512,
    depth = 2,
    heads = 8
)

v = T2TViT(
    dim = 512,
    image_size = 224,
    num_classes = 1000,
    transformer = performer,
    t2t_layers = ((7, 4), (3, 2), (3, 2))
)

img = torch.randn(1, 3, 224, 224)
v(img) # (1, 1000)