Open michaelweihaosong opened 1 year ago
Just found out why model size is twice as big.
feed forward layer has a multiplier of 4 for the dimension, after adding ff_mult=1, it's the same size.
However, performer is still slow compared to the regular ViT using torchvision.datasets.MNIST training set on RTX 3090
Regular ViT: Average seconds for training 1 epoch: 15.101385951042175 Average seconds for testing: 0.6326647281646729
Performer ViT: Average seconds for training 1 epoch: 28.795904541015624 Average seconds for testing: 0.9286866903305053
Hi,
First of all, this is a great package from lucidrains and I find it very helpful in my research.
A quick question is that I noticed ViT-performer is slower than the regular ViT from lucidrains. For example running on mnist from pytorch will take 15 sec/epoch for regular ViT with the configuration below while ViT performer takes 23 sec/epoch.
Checking the parameter count also shows ViT-performer has double the size of regular ViT.
I am hoping that someone has intuition about the speed of ViT performer vs regular ViT and their parameter counts.
Thank you very much in advance!