ouusan / some-papers

0 stars 0 forks source link

Diff-Attention #31

Open ouusan opened 6 days ago

ouusan commented 6 days ago

paper: https://arxiv.org/pdf/2410.05258 キャプチャ

Diff attention part: https://github.com/microsoft/unilm/blob/master/Diff-Transformer/multihead_diffattn.py#L99-L117 Diff attention in Flash Attention1/2 for different QKV dim: https://github.com/microsoft/unilm/blob/master/Diff-Transformer/multihead_flashdiff_1.py#L91

1.https://github.com/facebookresearch/xformers 2.https://github.com/Dao-AILab/flash-attention 3.https://aka.ms/flash-diff

ouusan commented 6 days ago

pre-RMSNorm: https://blog.csdn.net/qq_39970492/article/details/131125752 SwiGLU: https://zhuanlan.zhihu.com/p/650237644

ouusan commented 6 days ago

Flash Attention: