lucidrains / FLASH-pytorch

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
MIT License
344 stars 24 forks source link

The speed. #11

Open wangyuxin87 opened 1 year ago

wangyuxin87 commented 1 year ago

Thanks for your excellent work. However, GAU is slower than the original MHSA in my implementation, 3.5s vs 0.7s. As I simply use "from flash_pytorch import GAU" with the default setting. I there something wrong with my implementation?

image