Add FNet as a optional implementation of ViT

lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

MIT License

19.64k stars 2.96k forks source link

I think this is also quite interesting. I'd recommend building the hybrid model implementation for more optimal speed/accuracy tradeoff:

"we found that adding self-attention sublayers to FNet models offers a simple way to trade off speed for accuracy... specifically replacing the final two Fourier sublayers of FNet with self-attention layers yielded a model that acheived 97% of BERT accuracy, but pre-trained six times as fast on gpus..."

And go with NesT transformer as the home instead of ViT :)

lucidrains / vit-pytorch

Add FNet as a optional implementation of ViT #123