概要

TransformerはO(N^2 d)と，シーケンス長が長くなればなるほど計算量が多くなるという問題がある．
そこで本研究では，pair-wiseの計算ではなく，additive attentionによるglobal contextを用いることで，計算量をO(Nd)に抑えつつ，従来法に匹敵するか，それ以上の結果を残した

Reference

link

[2108.09084] Fastformer: Additive Attention Can Be All You Need
Fastformer: Additive Attention Can Be All You Need (Machine Learning Research Paper Explained) - YouTube

yiskw713 / paper_summary

Fastformer: Additive Attention Can Be All You Need #175

概要

Reference

link