microsoft / FocalNet

[NeurIPS 2022] Official code for "Focal Modulation Networks"
MIT License
682 stars 61 forks source link

Merge Prelinear and Post-Linear layer? #5

Closed chuong98 closed 1 year ago

chuong98 commented 2 years ago

Hi, thanks for releasing the code. Looking at the diagram and the code implementation, I believe we can merge the Post-Linear Projection layer from a previous Focal-Block into the Pre-Linear layer of the next FocalBlock, since they are both Matrix multiplication without the activation in between. This will save parameters and inference time. However, I am not sure the effect if we drop the Post-Linear layer during training. Looking for your opinion, Thanks.

jwyang commented 1 year ago

Hi, @chuong98 , this is a good point!