tatp22 / linformer-pytorch

My take on a practical implementation of Linformer for Pytorch.
https://arxiv.org/pdf/2006.04768.pdf
MIT License
400 stars 36 forks source link

Composed linear layers? #6

Closed apeguero1 closed 4 years ago

apeguero1 commented 4 years ago

Hey @tatp22 great repo!

I'm having trouble wrapping my head around the w_q, w_k, and w_v linear layers in the LinearAttentionHead module. Are they needed? There's no activation between the previous linear layers, to_q, to_k, to_v in MHAttention, and those weights so they wouldn't add any expressivity to the model since you would just be multiplying two matrices together which is equivalent to one linear layer. The E and F projections also seem like they're being composed with w_k, and w_v without a non-linearity.

Looking at Eq. 7 from the paper your implementation seems correct though.

Any thoughts on this?

tatp22 commented 4 years ago

Hi @apeguero1, thank you for pointing this out!

It seems as though you are correct. Looking at the code, It looks like I don't need the w_q, w_k, and w_v matrices, if I make sure that for each head, there is a to_q, to_k, and to_v matrix that is an nn.Linear(channels, dim) layer. This would mean that instead of having only one to_q, to_k, and to_v matrix for each MHAttention, there would be nhead of these matrices. This would mean that, essentially, the LinearAttentionHead module would just become a matrix multiplication with no learnable parameters, besides the E and F layers.

I will make a pull request later of this change, let me know what you think!

apeguero1 commented 4 years ago

Great sounds good, I'll take a look!

tatp22 commented 4 years ago

Check it out and see what you think, let me know if you see any errors :+1:

The only problem is that training takes longer, i don't know if that's just a thing with my computer and it not running so well, so if you see something that could affect that, please let me know! <- Nevermind that

apeguero1 commented 4 years ago

Yep looks good to me. Thanks for the quick response! (:

tatp22 commented 4 years ago

No prob :) Merged, and the latest version is available with version 0.10.0 on pip