FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness - Githubissues

usersan / papers

読んだ論文のメモ置き場：主にエッジAI、高速化、FPGA実装関連など

0 stars 0 forks source link

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness #38

Open tera1k opened 10 months ago

tera1k commented 10 months ago

0. 論文

https://arxiv.org/abs/2205.14135

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré

1. どんなもの？

2. 先行研究と比べてどこがすごい？

従来のAttentionは

長いSequenceでの学習が難しい
長い処理のためにBatchSizeを減らすと学習時間が長くなる

3. 技術や手法のキモはどこ？

Attention演算をタイルに分割 SRAMに乗るようにする

4. どうやって有効だと検証した？

5. 議論はある？

6. 次に読むべき論文は？

tera1k commented 10 months ago

tera1k commented 10 months ago

https://zenn.dev/nhandsome/articles/388b2ebb57d5d1