lucidrains / BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs
MIT License
384 stars 13 forks source link

Linear Attention #33

Open Psarpei opened 3 months ago

Psarpei commented 3 months ago

hello, I have two question about the Linear Attention that was added later. Can you clear me up why it is called Linear Attention when the referenced paper introduces Cross-Covariance Attention and why exactly is it better than Self-Attention?