Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count

lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

MIT License

1.07k stars 143 forks source link

Hi,

First of all, this is a great package from lucidrains and I find it very helpful in my research.

A quick question is that I noticed ViT-performer is slower than the regular ViT from lucidrains. For example running on mnist from pytorch will take 15 sec/epoch for regular ViT with the configuration below while ViT performer takes 23 sec/epoch.

Checking the parameter count also shows ViT-performer has double the size of regular ViT.

I am hoping that someone has intuition about the speed of ViT performer vs regular ViT and their parameter counts.

Thank you very much in advance!

lucidrains / performer-pytorch

Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count #92